100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

DATA MINING FINAL EXAM QUESTIONS AND ANSWERS

Rating
-
Sold
-
Pages
5
Uploaded on
26-03-2025
Written in
2024/2025

DATA MINING FINAL EXAM QUESTIONS AND ANSWERS

Institution
DATA MINING
Course
DATA MINING









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
DATA MINING
Course
DATA MINING

Document information

Uploaded on
March 26, 2025
Number of pages
5
Written in
2024/2025
Type
Exam (elaborations)
Contains
Unknown

Subjects

Content preview

DATA MINING FINAL EXAM
QUESTIONS AND ANSWERS
ANN (Artificial neural Networks)? Execution? - Answer-Attempt to replicate non-
linear learning found in nature.
Execution:
1. prepare data
2. design network architecture
3. initalize
4. train using back propogationg
5. evaluate

What isn SVM (Support Vector Machine)? Execution? - Answer-A versatile algorithm
that maximizes the margin between classes by finding the optimal hyperplane
suitable for complex classification and regression tasks.
Execution:
1. prepare/split data
2. train with suitable parameters
3. evaluate

What are Bayesian Methods? Execution? - Answer-Methods that use Baye's
theorem to compute and update probabilities after obtaining data
Execution:
1. define prior probabilities
2. update using observed data to obtain posterior probabilities
3. perform inference or predictions

What are representative ensemble methods and their main idea? - Answer-Uses a
combination of models to increase accuracy.
ex. bagging, boosting, random forest, ensemble

What is clustering? - Answer-A collection of data objects similar to one another
within the same group

Good vs Bad clustering? - Answer-Good: high intra-class similarity (cohesive within
cluster), low inter-class similarity (distinctive between clusters)

Bad: shows poor separation and lack of clear structure

What are k-means? - Answer-Each cluster is represented by the center of the
cluster. An algorithm that partitions data into a specified number of clusters by
assigning each data point to the nearest cluster based on mean distance.

What are k-mediods? - Answer-Uses medoids (most centrally located object in
cluster) as a reference point instead of the mean

, What is AGNES? (Agglomerative Nesting) - Answer-Uses single-linkage method and
dissimilarity matrix, merge nodes that have the lowest dissimilarity and progress until
all nodes are in the same cluster

How does BIRCH (Balanced Iterative Reducing Clustering using Hierarchies) work?
- Answer-designed for large datasets, incrementally builds a hierarchical data
structure called a CF clustering feature to manage data. Uses a 2-phase approach to
create and refine clusters.

DBSCAN Pros vs Cons - Answer-pros: resistant to noise/ can handle clusters of
different shapes and sizes
cons: cannot handle varying densities/ sensitive to parameters

How to run DBSCAN? - Answer-1. randomly select point p
2. retrieve all points density reachable from p wrt Eps and MinPts
3. Continue process until all points have been processed

Only one scan is needed

How does Kohonen network? - Answer-1. competition
2. cooperation
3. adaptation
4. adjust the learning rate and neighborhood size as needed
5. stops when termination criteria is met

Hopkins Statistic - Answer-measures used to assess the clustering tendency of a
dataset by quantifying the degree of clustering vs randomness in the data

Silhouette Coefficient - Answer-Evaluates clustering quality by measuring
compactness and separation

What is association rule mining and the motivation? - Answer-A technique used in
data mining to discover relationships or associations between variables in large
datasets.

motivation: finding inherent regularities in data

What is frequent itemset mining? How is it done? - Answer-A task in data mining that
involves identifying sets of items that regularly occur together in a dataset.

Apriori - Answer-Efficiently identifies frequent item sets in a dataset and generates
an association based rule based on the itemset

What is confidence interval estimation? - Answer-A statistical technique used to
estimate a range within which a population parameter is likely to lie with a specified
level of confidence.

What is hypothesis testing? - Answer-A procedure used to make inferences about a
population based on sample data

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
biggdreamer Havard School
View profile
Follow You need to be logged in order to follow users or courses
Sold
247
Member since
2 year
Number of followers
68
Documents
17943
Last sold
1 week ago

4.0

38 reviews

5
22
4
4
3
6
2
2
1
4

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions