Exam (elaborations)

Data Mining Test 1 Questions with Complete Solutions

Rating

Sold

Pages

Grade

A+

Uploaded on

23-02-2025

Written in

2024/2025

Data Mining Test 1 Questions with Complete Solutions

Institution

Data Mining

Course

Data Mining

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Data Mining
Course: Data Mining

Document information

Uploaded on: February 23, 2025
Number of pages: 5
Written in: 2024/2025
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

data mining test 1 questions with complete solutio

Content preview

Data Mining Test 1 Questions with
Complete Solutions
Pattern Evaluation - Answer-To identify the truly interesting patterns representing
knowledge based on interestingness measures.

knowledge presentation - Answer-where visualization and knowledge representation
techniques are used to present mined knowledge to users.

5-Number summary - Answer-Consists of the following: Minimum, Quartile 1 (Q1),
Median, Quartile 3 (Q3) and Max.

Data sets with one, two, or three modes are respectively called: - Answer-Uni-modal,
Bi-modal, and Tri-modal.

How is the Interquartile Range calculated? - Answer-Quartile 3 minus Quartile 1
(IQR = Q3 - Q1).

What are the primary factors that comprise data quality? - Answer-Accuracy,
completeness, consistency, timeliness,
believability, and interpretability

Data quality - Accuracy - Answer-Inaccurate, incomplete, and inconsistent data. Can
be caused by faulty instruments during data recording, human or computer error, or
user entered disguised missing data (intentional inaccurately entered data)

Data quality - Completeness - Answer-Missing data. Can be caused due to data that
is unavailable. Also can be caused by neglect to record data if it was not considered
useful at the time of recording, equipment malfunctions, etc.

Data quality - timeliness - Answer-The process in which data is recorded consistently
can impact the quality of the data. For example, imagine sales representatives
submitting sales records at different intervals which causes inaccuracy in data to
determine sales bonuses for top performing Sales rep. employees.

Data Quality - Believability - Answer-Reflects how much the data are trusted by
users.

Data Quality - Interpretability - Answer-Reflects how easy the data are understood.

Machine learning - Answer-investigates how computers can learn or improve their
performance based on data.

Supervised learning - Answer-Basically a synonym for classification. The supervision
in the learning comes from the labeled examples in the training data set. For
example, in the postal code recognition problem, a set of handwritten postal code
images and their corresponding machine-readable translations are used as the
training examples, which supervise the learning of the classification model

, Unsupervised learning - Answer-Essentially a synonym for clustering. The learning
process is unsupervised since the input examples are not class labeled. Typically,
we may use clustering to discover classes within the data. For example, an
unsupervised learning method can take, as input, a set of images of handwritten
digits. Suppose that it finds
10 clusters of data. These clusters may correspond to the 10 distinct digits of 0 to 9,
respectively.

Semi-supervised learning - Answer-A class of machine learning techniques that
make use
of both labeled and unlabeled examples when learning a model. In one approach,
labeled examples are used to learn class models and unlabeled examples are used
to refine the boundaries between classes. For a two-class problem, we can think of
the set of examples belonging to one class as the positive examples and those
belonging to the other class as the negative examples.

Active learning - Answer-machine learning approach that lets users play an active
role in the learning process. An active learning approach can ask a user (e.g., a
domain expert) to label an example, which may be from a set of unlabeled examples
or synthesized by the learning program.

Outlier - Answer-A data set may contain objects that do not comply with the general
behavior or model of the data.

Data discrimination - Answer-a comparison of the general features of the target class
data objects against the general features of objects from one or multiple contrasting
classes. The target and contrasting classes can be specified by a user, and the
corresponding data objects can be retrieved through database queries.

Data cube - Answer-A multidimensional data structure in which each dimension
corresponds to an attribute or a set of attributes in the schema, and each cell stores
the value of some aggregate measure such as count.

Cluster Analysis - Answer-Analyzes data objects without consulting class labels.
Clustering can be used to generate class labels for a group of data. clusters of
objects are formed so that objects within a cluster have high similarity in comparison
to one another, but are rather dissimilar to objects in other clusters.

Outlier Analysis - Answer-Rather than discarding outliers as noise, they can be used
in to observe interesting behaviors. A typical application could be fraud detection.

What are the 6 methods to handle Missing Values? - Answer-1. Ignore the tuple.
2. Fill in the missing value manually.
3. Use a global constant to fill in the missing value.
4.Use a measure of central tendency for the attribute (e.g., the mean or median) to
fill in the missing value.
5.Use the attribute mean or median for all samples belonging to the same class as
the given tuple.

$15.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

lectknancy

3.6

(57)

Also available in package deal

Get to know the seller

lectknancy Boston University

View profile

Sold

281

Member since

2 year

Number of followers

Documents

25966

Last sold

5 days ago

3.6

57 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lectknancy. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $15.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45736 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Data Mining Test 1 Questions with Complete Solutions

Written for

Document information

Subjects

Content preview

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?