100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Data Mining Test 1 Questions with Complete Solutions

Rating
-
Sold
-
Pages
5
Grade
A+
Uploaded on
23-02-2025
Written in
2024/2025

Data Mining Test 1 Questions with Complete Solutions

Institution
Data Mining
Course
Data Mining









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Data Mining
Course
Data Mining

Document information

Uploaded on
February 23, 2025
Number of pages
5
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

Data Mining Test 1 Questions with
Complete Solutions
Pattern Evaluation - Answer-To identify the truly interesting patterns representing
knowledge based on interestingness measures.

knowledge presentation - Answer-where visualization and knowledge representation
techniques are used to present mined knowledge to users.

5-Number summary - Answer-Consists of the following: Minimum, Quartile 1 (Q1),
Median, Quartile 3 (Q3) and Max.

Data sets with one, two, or three modes are respectively called: - Answer-Uni-modal,
Bi-modal, and Tri-modal.

How is the Interquartile Range calculated? - Answer-Quartile 3 minus Quartile 1
(IQR = Q3 - Q1).

What are the primary factors that comprise data quality? - Answer-Accuracy,
completeness, consistency, timeliness,
believability, and interpretability

Data quality - Accuracy - Answer-Inaccurate, incomplete, and inconsistent data. Can
be caused by faulty instruments during data recording, human or computer error, or
user entered disguised missing data (intentional inaccurately entered data)

Data quality - Completeness - Answer-Missing data. Can be caused due to data that
is unavailable. Also can be caused by neglect to record data if it was not considered
useful at the time of recording, equipment malfunctions, etc.

Data quality - timeliness - Answer-The process in which data is recorded consistently
can impact the quality of the data. For example, imagine sales representatives
submitting sales records at different intervals which causes inaccuracy in data to
determine sales bonuses for top performing Sales rep. employees.

Data Quality - Believability - Answer-Reflects how much the data are trusted by
users.

Data Quality - Interpretability - Answer-Reflects how easy the data are understood.

Machine learning - Answer-investigates how computers can learn or improve their
performance based on data.

Supervised learning - Answer-Basically a synonym for classification. The supervision
in the learning comes from the labeled examples in the training data set. For
example, in the postal code recognition problem, a set of handwritten postal code
images and their corresponding machine-readable translations are used as the
training examples, which supervise the learning of the classification model

, Unsupervised learning - Answer-Essentially a synonym for clustering. The learning
process is unsupervised since the input examples are not class labeled. Typically,
we may use clustering to discover classes within the data. For example, an
unsupervised learning method can take, as input, a set of images of handwritten
digits. Suppose that it finds
10 clusters of data. These clusters may correspond to the 10 distinct digits of 0 to 9,
respectively.

Semi-supervised learning - Answer-A class of machine learning techniques that
make use
of both labeled and unlabeled examples when learning a model. In one approach,
labeled examples are used to learn class models and unlabeled examples are used
to refine the boundaries between classes. For a two-class problem, we can think of
the set of examples belonging to one class as the positive examples and those
belonging to the other class as the negative examples.

Active learning - Answer-machine learning approach that lets users play an active
role in the learning process. An active learning approach can ask a user (e.g., a
domain expert) to label an example, which may be from a set of unlabeled examples
or synthesized by the learning program.

Outlier - Answer-A data set may contain objects that do not comply with the general
behavior or model of the data.

Data discrimination - Answer-a comparison of the general features of the target class
data objects against the general features of objects from one or multiple contrasting
classes. The target and contrasting classes can be specified by a user, and the
corresponding data objects can be retrieved through database queries.

Data cube - Answer-A multidimensional data structure in which each dimension
corresponds to an attribute or a set of attributes in the schema, and each cell stores
the value of some aggregate measure such as count.

Cluster Analysis - Answer-Analyzes data objects without consulting class labels.
Clustering can be used to generate class labels for a group of data. clusters of
objects are formed so that objects within a cluster have high similarity in comparison
to one another, but are rather dissimilar to objects in other clusters.

Outlier Analysis - Answer-Rather than discarding outliers as noise, they can be used
in to observe interesting behaviors. A typical application could be fraud detection.

What are the 6 methods to handle Missing Values? - Answer-1. Ignore the tuple.
2. Fill in the missing value manually.
3. Use a global constant to fill in the missing value.
4.Use a measure of central tendency for the attribute (e.g., the mean or median) to
fill in the missing value.
5.Use the attribute mean or median for all samples belonging to the same class as
the given tuple.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
lectknancy Boston University
View profile
Follow You need to be logged in order to follow users or courses
Sold
281
Member since
2 year
Number of followers
28
Documents
25966
Last sold
5 days ago

3.6

57 reviews

5
23
4
10
3
11
2
3
1
10

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions