100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

DTSA 5504 - DATA MINING PIPELINE EXAM QUESTIONS AND ANSWERS

Rating
-
Sold
-
Pages
8
Uploaded on
26-03-2025
Written in
2024/2025

DTSA 5504 - DATA MINING PIPELINE EXAM QUESTIONS AND ANSWERS

Institution
DATA MINING
Course
DATA MINING









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
DATA MINING
Course
DATA MINING

Document information

Uploaded on
March 26, 2025
Number of pages
8
Written in
2024/2025
Type
Exam (elaborations)
Contains
Unknown

Subjects

Content preview

DTSA 5504 - DATA MINING PIPELINE
EXAM QUESTIONS AND ANSWERS
What are Two Types of Data Attributes? - Answer-1.) Categorical (nominal, binary,
ordinal)

2.) Numeric (discrete, continuous)

What are some kinds of Data Statistics? - Answer-Categorical: % of each value,
Numeric: central tendency, dispersion

What elements make up Central Tendency? - Answer-Mean, Median, Mode,
Midrange

What elements make up Dispersion? - Answer-Range (max - min),
Quartiles(Q1:25%, Q3:75%), IQR (Q3-Q1), Variance, Standard Deviation

What are some examples of plot types for data visualization? - Answer-Boxplots,
histograms, scatterplots, pie, line, heatmap, word cloud, network, area, bubble

Object Similarity - Answer-n objects x p attributes

Object Dissimilarity - Answer-n objects x n objects

Nominal Similarity - Answer-s=1 if x=y, otherwise s=0

Nominal Dissimilarity - Answer-d=0 if x=y, otherwise d=1

Binary Symmetry - Answer-Equal chance of Y or N

Binary Asymmetry - Answer-Y is less likely than N

Symmetric Variables Equation - Answer-((r + s) / (q + r + s + t)) = d(i,j)

Asymmetric Variables Equation - Answer-(q / (q+r+s) or 1 - d(i,j)) = sim(i,j) or Jaccard
coefficient ; d(i,j) = (r + s)/(q + r + s)

Jaccard coefficient - Answer-(q / (q+r+s))

Ordinal Attributes - Answer-for all r(if) in {1,...,Mf}, z(if) = (r(if) - 1)/(Mf - 1)

Numeric Object Dissimilarity - Answer-Usually measured by distance with Minkowski
distance (I_p norm)

Minkowski Distance - Answer-d(i,j) = (abs(xi1, xj1)**p + ... + abs(xin, xjn)***p)**(1/p),
where p=1 (Manhattan Distance) or p=2 (Euclidean Distance)

, Distance Measure Properties - Answer-d(i,j) <= d(i,k) + d(k,j), triangular inequality

cosine similarity - Answer-cos(A,B) = (A*B) / ||A||||B|| = (A*B) / (sum(A)^2 *
sum(B)^2)

What operations are involved with sequential data and time series? - Answer-
Euclidean matching, dynamic time warping, minimum jump cost

Mixed Attribute Types - Answer-Weighted sum across attributes. d(i,j)=(sum(dij,
dij))/sum(dij)

When to use Euclidean/Manhattan processes? - Answer-Dense, continuous data

When to ignore null/null cases? - Answer-asymmetric attributes

When to use cosine similarity or Jaccard similarity? - Answer-sparse data

When to use seasonal patterns or subgroups? - Answer-Subset data

In a boxplot, what does the IQR represent? - Answer-The height of the bar in the
boxplot

In what ways can one transform data? - Answer-Smoothing, aggregation,
generalization, normalization, discretization, attribute construction

Formula for min-max normalization - Answer-v' = (v-min)/(max-min) * (max' - min') +
min'

Formula for mean normalization - Answer-v' = (v-mean)/(max-min)

Formula for standardized normalization - Answer-v' = (v-mean)/stdev

What does discretization involve? - Answer-Continuous->intervals, Split or merge,
Supervised or unsupervised labels

Methods for Unsupervised Discretization - Answer-Binning/Histogram Analysis,
Clustering Analysis, Intuitive partitioning

Properties for Supervised Discretization - Answer-Pre-determined class labels,
entropy-based interval splitting, X^2 analysis-based interval merging

Properties of Data Reduction - Answer-Dimensionality reduction = attributes,
numerosity reduction = objects

Properties of Attribute Selection - Answer-Forward selection, Backward elimination,
Feature engineering

Feature engineering - Answer-The process of determining which features might be
useful in training a model, and then converting raw data from log files and other
sources into said features. In TensorFlow, feature engineering often means

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
biggdreamer Havard School
View profile
Follow You need to be logged in order to follow users or courses
Sold
247
Member since
2 year
Number of followers
68
Documents
17943
Last sold
1 week ago

4.0

38 reviews

5
22
4
4
3
6
2
2
1
4

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions