EXAM #2 - DATA MINING QUESTIONS
AND ANSWERS
Step 1: Business understanding - Answer-- Know what the analysis is for
- Specific goals tied to potential action are critical
- These allow development of a project plan
Step 2: Data Understanding - Answer-- Know what data is relevant
- Know what data is available or acquirable
- Understand the data types (determines analytic technique)
Step 3: Data Preparation - Answer-- Gather relevant data and perform integration
processes
- Clean data to the extent possible (more later)
- Check data for quality (more later)
- Transform data into consistent formats, ranges, and aggregations as necessary
- Remove unnecessary or redundant data
Step 4: Model Building - Answer-Select an appropriate technique based on need and
data types
Step 5: Testing and Evaluation - Answer-- Often uses a portion of the dataset
- Evaluate outcome for reasonableness
- Refine the model as necessary
- Retest
Step 6: Deployment - Answer-- ACTION!!
- May be further testing, refinement, or new business policy/process, etc.
Predictive Data Mining branches - Answer-- Classification
- Prediction
- Time-Series Analysis
Descriptive Data Mining branches - Answer-- Association
- Clustering
- Summarization
Support for Rule = - Answer-(Condition + Result) / All
Support forCondition = - Answer-Condition / All
Support for Result = - Answer-Result / All
Confidence = - Answer-rule / condition
lift = - Answer-confidence in rule / support for result
AND ANSWERS
Step 1: Business understanding - Answer-- Know what the analysis is for
- Specific goals tied to potential action are critical
- These allow development of a project plan
Step 2: Data Understanding - Answer-- Know what data is relevant
- Know what data is available or acquirable
- Understand the data types (determines analytic technique)
Step 3: Data Preparation - Answer-- Gather relevant data and perform integration
processes
- Clean data to the extent possible (more later)
- Check data for quality (more later)
- Transform data into consistent formats, ranges, and aggregations as necessary
- Remove unnecessary or redundant data
Step 4: Model Building - Answer-Select an appropriate technique based on need and
data types
Step 5: Testing and Evaluation - Answer-- Often uses a portion of the dataset
- Evaluate outcome for reasonableness
- Refine the model as necessary
- Retest
Step 6: Deployment - Answer-- ACTION!!
- May be further testing, refinement, or new business policy/process, etc.
Predictive Data Mining branches - Answer-- Classification
- Prediction
- Time-Series Analysis
Descriptive Data Mining branches - Answer-- Association
- Clustering
- Summarization
Support for Rule = - Answer-(Condition + Result) / All
Support forCondition = - Answer-Condition / All
Support for Result = - Answer-Result / All
Confidence = - Answer-rule / condition
lift = - Answer-confidence in rule / support for result