100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

DATA MINING EXAM REVIEW QUESTIONS WITH CORRECT ANSWERS

Rating
-
Sold
-
Pages
10
Uploaded on
26-03-2025
Written in
2024/2025

DATA MINING EXAM REVIEW QUESTIONS WITH CORRECT ANSWERS

Institution
DATA MINING
Course
DATA MINING









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
DATA MINING
Course
DATA MINING

Document information

Uploaded on
March 26, 2025
Number of pages
10
Written in
2024/2025
Type
Exam (elaborations)
Contains
Unknown

Subjects

Content preview

DATA MINING EXAM REVIEW
QUESTIONS WITH CORRECT
ANSWERS
Phase 2: Data Collection - Answer--Initial Data Collection
-Data are often collected for purposes unrelated to the current business problem.
This is very common in most companies.

-Proceeds with activities aimed at:
Understand the data: relevance, cost and reliability
-Identifying data quality problems

Phases 1 & 2 - Answer-The initial formulation may not be complete or optimal or
feasible, so multiple iterations may be necessary for an acceptable solution
formulation to appear. The goal is a successful data mining formulation to appear.

The goal is a successful data mining formulation that can be solved later by available
data.

Phase 3: Data preparation - Answer-Can take over 90% of the time!

-Covers all activities to construct the final dataset (data that will be fed into the
modeling tool(s) from the initial raw data

Phase 4: Modeling - Answer-Selecting modeling techniques and calibrating their
paramaters

Typically, there are several techniques for the same data mining problem type.

-Generate the test design, and test the model's quality and validity.

Phase 5: Evaluation - Answer--Review process
-Evaluate performance
-choose the right evaluation metric

Phase 6: Deployment - Answer-Determine how the results need to be utilized

-Who needs to use them?
-How often do they need to be utilized?

Data Preparation Steps on Rattle: - Answer-Step 1: Load data and partition data
Step 2: Recognize the correct type of feature
Step 3: Deal with the missing value
Step 4: Transform feature into the correct form
Step 5: Recognize the correct input, target

, Validation set: - Answer-used to tune parameters in models. Not all modeling
algorithms need a validation set

Test Set: - Answer-To assess the likely future performance of a model (test data
does not participate in the training or parameter tuning steps)

Nominal: - Answer-has two or more categories, but there is no intrinsic ordering to
the categories

Ordinal - Answer-similar to categorical but there is a clear ordering of the variables

Reasons for missing values - Answer--Information is not collected
-Attributes may not be applicable to all cases

Handle missing values - Answer--Delete missing features
-Delete observations with missing values
-Impute
-Replace or treat as category

Imputation - Answer-replacing missing data with the substituted values estimated
from the data set

-mean/ median/ mode imputation (Rattle)
-regression imputation

Normilization - Answer-change the range or distribution of data

Recenter - Answer-Move the distribution such that the mean of the feature is 0

Rescale (usually preferred) - Answer-Scale the feature such that the range is 0 to 1

(Xi-Xmin)/(Xmax-Xmin)

Numeric to categorical - Answer-Discretization (sometimes necessary, depends on
the model): recode data into intervals

Quantiles - Answer-Equal frequency distributed in each bin

Kurtosis - Answer--a measure of "tailedness"

-a useful measure of whether there is a problem with outliers in a data set. Larger
kurtosis indicates a more serious outlier problem

Numeric Attributes, Single Feature Visualization - Answer-Box Plot
Histogram
Cumulative Plot

Numeric Attributes, Pairs of Features Visualization - Answer-Scatter Plot

Categorical Attributes, Single Feature Visualization - Answer-Bar plot, dot plot

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
biggdreamer Havard School
View profile
Follow You need to be logged in order to follow users or courses
Sold
247
Member since
2 year
Number of followers
68
Documents
17943
Last sold
1 week ago

4.0

38 reviews

5
22
4
4
3
6
2
2
1
4

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions