Introduction to Data Mining Exam
Questions and Answers
Clustering - Answer-Descriptive
* Given a set of data points, each having a set of attributes, and a similarity measure
among them, find clusters such that
* Data points in one cluster are more similar to one another.
* Data points in separate clusters are less similar to one another.
Similarity Measures:
Euclidean Distance if attributes are continuous.
Other Problem-specific Measures.
Association Rule Discovery - Answer-Descriptive
*Given a set of records each of which contain some number of items from a given
collection;
*Produce dependency rules which will predict occurrence of an item based on
occurrences of other items.
Sequential Pattern Discovery - Answer-Descriptive
Given is a set of objects, with each object associated with its own timeline of events,
find rules that predict strong sequential dependencies among different events.
Rules are formed by first discovering patterns. Event occurrences in the patterns are
governed by timing constraints.
Regression - Answer-Predictive
* Predict a value of a given continuous valued variable based on the values of other
variables, assuming a linear or nonlinear model of dependency.
* Greatly studied in statistics, neural network fields.
* Examples:
- Predicting sales amounts of new product based on advertising expenditure.
- Predicting wind velocities as a function of temperature, humidity, air pressure, etc.
- Time series prediction of stock market indices.
Deviation Detection - Answer-Predictive
* Detect significant deviations from normal behavior
* Applications:
- Credit Card Fraud Detection
- Network Intrusion
Detection
Challenges of Data Mining - Answer-Scalability
Dimensionality
Questions and Answers
Clustering - Answer-Descriptive
* Given a set of data points, each having a set of attributes, and a similarity measure
among them, find clusters such that
* Data points in one cluster are more similar to one another.
* Data points in separate clusters are less similar to one another.
Similarity Measures:
Euclidean Distance if attributes are continuous.
Other Problem-specific Measures.
Association Rule Discovery - Answer-Descriptive
*Given a set of records each of which contain some number of items from a given
collection;
*Produce dependency rules which will predict occurrence of an item based on
occurrences of other items.
Sequential Pattern Discovery - Answer-Descriptive
Given is a set of objects, with each object associated with its own timeline of events,
find rules that predict strong sequential dependencies among different events.
Rules are formed by first discovering patterns. Event occurrences in the patterns are
governed by timing constraints.
Regression - Answer-Predictive
* Predict a value of a given continuous valued variable based on the values of other
variables, assuming a linear or nonlinear model of dependency.
* Greatly studied in statistics, neural network fields.
* Examples:
- Predicting sales amounts of new product based on advertising expenditure.
- Predicting wind velocities as a function of temperature, humidity, air pressure, etc.
- Time series prediction of stock market indices.
Deviation Detection - Answer-Predictive
* Detect significant deviations from normal behavior
* Applications:
- Credit Card Fraud Detection
- Network Intrusion
Detection
Challenges of Data Mining - Answer-Scalability
Dimensionality