100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Data Mining Exam #1 Questions with Latest Update

Rating
-
Sold
-
Pages
10
Grade
A+
Uploaded on
23-02-2025
Written in
2024/2025

Data Mining Exam #1 Questions with Latest Update

Institution
Data Mining
Course
Data Mining









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Data Mining
Course
Data Mining

Document information

Uploaded on
February 23, 2025
Number of pages
10
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

Data Mining Exam #1 Questions with
Latest Update
co-occurence grouping - Answer-Also known as frequent items mining, association
rule discovery, and market-basket analysis. To find associations between entities
based on transactions involving them.

Examples: Product display, product recommendation, Amazon, etc

Data reduction - Answer-To replace a large data set with a smaller set of data that
contains much of the important information in the large data set. Usually involves
loss of information; trade-off.

Goal of classification: - Answer-find a decision boundary (represented by a model)
that separates one class from the other.

Use of training data - Answer-to find out a model that optimizes a pre-defined
objective

Supervised learning - Answer-training data includes both the input (X) and the target
variable (Y)

Unsupervised learning - Answer-the model is NOT provided with the target variable
(Y) during training

-Classification
-Regression
-Data Reduction - Answer-Supervised Learning Examples:

-Clustering
-Co-occurence Grouping
-Data Reduction - Answer-Unsupervised learning examples

Why CRISP-DM? - Answer-Cross Industry Standard Process for Data Mining

The data mining process must be consistent, reliable and repeatable.

Provides a uniform framework for guidelines, and experience documentation

CRISP-DM process - Answer-Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment

Phase I Business Understanding: - Answer--Understanding the project objectives
and requirements from a business perspective

, -casting the business problem as one or more DM problems and creating a
preliminary plan to achieve the objectives

Phase 2: Data Collection - Answer--Initial Data Collection
-Data are often collected for purposes unrelated to the current business problem.
This is very common in most companies.

-Proceeds with activities aimed at:
Understand the data: relevance, cost and reliability
-Identifying data quality problems

Phases 1 & 2 - Answer-The initial formulation may not be complete or optimal or
feasible, so multiple iterations may be necessary for an acceptable solution
formulation to appear. The goal is a successful data mining formulation to appear.

The goal is a successful data mining formulation that can be solved later by available
data.

Phase 3: Data preparation - Answer-Can take over 90% of the time!

-Covers all activities to construct the final dataset (data that will be fed into the
modeling tool(s) from the initial raw data

Phase 4: Modeling - Answer-Selecting modeling techniques and calibrating their
paramaters

Typically, there are several techniques for the same data mining problem type.

-Generate the test design, and test the model's quality and validity.

Phase 5: Evaluation - Answer--Review process
-Evaluate performance
-choose the right evaluation metric

Phase 6: Deployment - Answer-Determine how the results need to be utilized

-Who needs to use them?
-How often do they need to be utilized?

Data Preparation Steps on Rattle: - Answer-Step 1: Load data and partition data
Step 2: Recognize the correct type of feature
Step 3: Deal with the missing value
Step 4: Transform feature into the correct form
Step 5: Recognize the correct input, target

Validation set: - Answer-used to tune parameters in models. Not all modeling
algorithms need a validation set

Test Set: - Answer-To assess the likely future performance of a model (test data
does not participate in the training or parameter tuning steps)

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
lectknancy Boston University
View profile
Follow You need to be logged in order to follow users or courses
Sold
283
Member since
2 year
Number of followers
28
Documents
25966
Last sold
1 week ago

3.6

57 reviews

5
23
4
10
3
11
2
3
1
10

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions