Exam (elaborations)

DTSA 5504 - DATA MINING PIPELINE EXAM QUESTIONS AND ANSWERS

Rating

Sold

Pages

Uploaded on

26-03-2025

Written in

2024/2025

DTSA 5504 - DATA MINING PIPELINE EXAM QUESTIONS AND ANSWERS

Institution

DATA MINING

Course

DATA MINING

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: DATA MINING
Course: DATA MINING

Document information

Uploaded on: March 26, 2025
Number of pages: 8
Written in: 2024/2025
Type: Exam (elaborations)
Contains: Unknown

Subjects

dtsa 5504 data mining pipeline exam questions an

Content preview

DTSA 5504 - DATA MINING PIPELINE
EXAM QUESTIONS AND ANSWERS
What are Two Types of Data Attributes? - Answer-1.) Categorical (nominal, binary,
ordinal)

2.) Numeric (discrete, continuous)

What are some kinds of Data Statistics? - Answer-Categorical: % of each value,
Numeric: central tendency, dispersion

What elements make up Central Tendency? - Answer-Mean, Median, Mode,
Midrange

What elements make up Dispersion? - Answer-Range (max - min),
Quartiles(Q1:25%, Q3:75%), IQR (Q3-Q1), Variance, Standard Deviation

What are some examples of plot types for data visualization? - Answer-Boxplots,
histograms, scatterplots, pie, line, heatmap, word cloud, network, area, bubble

Object Similarity - Answer-n objects x p attributes

Object Dissimilarity - Answer-n objects x n objects

Nominal Similarity - Answer-s=1 if x=y, otherwise s=0

Nominal Dissimilarity - Answer-d=0 if x=y, otherwise d=1

Binary Symmetry - Answer-Equal chance of Y or N

Binary Asymmetry - Answer-Y is less likely than N

Symmetric Variables Equation - Answer-((r + s) / (q + r + s + t)) = d(i,j)

Asymmetric Variables Equation - Answer-(q / (q+r+s) or 1 - d(i,j)) = sim(i,j) or Jaccard
coefficient ; d(i,j) = (r + s)/(q + r + s)

Jaccard coefficient - Answer-(q / (q+r+s))

Ordinal Attributes - Answer-for all r(if) in {1,...,Mf}, z(if) = (r(if) - 1)/(Mf - 1)

Numeric Object Dissimilarity - Answer-Usually measured by distance with Minkowski
distance (I_p norm)

Minkowski Distance - Answer-d(i,j) = (abs(xi1, xj1)**p + ... + abs(xin, xjn)***p)**(1/p),
where p=1 (Manhattan Distance) or p=2 (Euclidean Distance)

, Distance Measure Properties - Answer-d(i,j) <= d(i,k) + d(k,j), triangular inequality

cosine similarity - Answer-cos(A,B) = (A*B) / ||A||||B|| = (A*B) / (sum(A)^2 *
sum(B)^2)

What operations are involved with sequential data and time series? - Answer-
Euclidean matching, dynamic time warping, minimum jump cost

Mixed Attribute Types - Answer-Weighted sum across attributes. d(i,j)=(sum(dij,
dij))/sum(dij)

When to use Euclidean/Manhattan processes? - Answer-Dense, continuous data

When to ignore null/null cases? - Answer-asymmetric attributes

When to use cosine similarity or Jaccard similarity? - Answer-sparse data

When to use seasonal patterns or subgroups? - Answer-Subset data

In a boxplot, what does the IQR represent? - Answer-The height of the bar in the
boxplot

In what ways can one transform data? - Answer-Smoothing, aggregation,
generalization, normalization, discretization, attribute construction

Formula for min-max normalization - Answer-v' = (v-min)/(max-min) * (max' - min') +
min'

Formula for mean normalization - Answer-v' = (v-mean)/(max-min)

Formula for standardized normalization - Answer-v' = (v-mean)/stdev

What does discretization involve? - Answer-Continuous->intervals, Split or merge,
Supervised or unsupervised labels

Methods for Unsupervised Discretization - Answer-Binning/Histogram Analysis,
Clustering Analysis, Intuitive partitioning

Properties for Supervised Discretization - Answer-Pre-determined class labels,
entropy-based interval splitting, X^2 analysis-based interval merging

Properties of Data Reduction - Answer-Dimensionality reduction = attributes,
numerosity reduction = objects

Properties of Attribute Selection - Answer-Forward selection, Backward elimination,
Feature engineering

Feature engineering - Answer-The process of determining which features might be
useful in training a model, and then converting raw data from log files and other
sources into said features. In TensorFlow, feature engineering often means

$15.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

biggdreamer

4.0

(38)

Document also available in package deal

Get to know the seller

biggdreamer Havard School

View profile

Sold

247

Member since

2 year

Number of followers

Documents

17943

Last sold

1 week ago

4.0

38 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller biggdreamer. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $15.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47134 documents were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now

DTSA 5504 - DATA MINING PIPELINE EXAM QUESTIONS AND ANSWERS

Written for

Document information

Subjects

Content preview

Document also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay how you prefer, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying this summary from?

Will I be stuck with a subscription?

Can Stuvia be trusted?