100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

DATA MINING ASSESSMENT TEST 1 QUESTIONS WITH COMPLETE SOLUTIONS

Rating
-
Sold
-
Pages
5
Uploaded on
26-03-2025
Written in
2024/2025

DATA MINING ASSESSMENT TEST 1 QUESTIONS WITH COMPLETE SOLUTIONS

Institution
DATA MINING
Course
DATA MINING









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
DATA MINING
Course
DATA MINING

Document information

Uploaded on
March 26, 2025
Number of pages
5
Written in
2024/2025
Type
Exam (elaborations)
Contains
Unknown

Subjects

Content preview

DATA MINING ASSESSMENT TEST 1
QUESTIONS WITH COMPLETE
SOLUTIONS
Semi-supervised learning - Answer-A class of machine learning techniques that
make use
of both labeled and unlabeled examples when learning a model. In one approach,
labeled examples are used to learn class models and unlabeled examples are used
to refine the boundaries between classes. For a two-class problem, we can think of
the set of examples belonging to one class as the positive examples and those
belonging to the other class as the negative examples.

Active learning - Answer-machine learning approach that lets users play an active
role in the learning process. An active learning approach can ask a user (e.g., a
domain expert) to label an example, which may be from a set of unlabeled examples
or synthesized by the learning program.

Outlier - Answer-A data set may contain objects that do not comply with the general
behavior or model of the data.

Data discrimination - Answer-a comparison of the general features of the target class
data objects against the general features of objects from one or multiple contrasting
classes. The target and contrasting classes can be specified by a user, and the
corresponding data objects can be retrieved through database queries.

Data cube - Answer-A multidimensional data structure in which each dimension
corresponds to an attribute or a set of attributes in the schema, and each cell stores
the value of some aggregate measure such as count.

Cluster Analysis - Answer-Analyzes data objects without consulting class labels.
Clustering can be used to generate class labels for a group of data. clusters of
objects are formed so that objects within a cluster have high similarity in comparison
to one another, but are rather dissimilar to objects in other clusters.

Outlier Analysis - Answer-Rather than discarding outliers as noise, they can be used
in to observe interesting behaviors. A typical application could be fraud detection.

What are the 6 methods to handle Missing Values? - Answer-1. Ignore the tuple.
2. Fill in the missing value manually.
3. Use a global constant to fill in the missing value.
4.Use a measure of central tendency for the attribute (e.g., the mean or median) to
fill in the missing value.
5.Use the attribute mean or median for all samples belonging to the same class as
the given tuple.
6. Use the most probable value to fill in the missing value (may be determined with
regression, inference-based tools using a Bayesian formalism, or decision tree
induction).

, Noise - Answer-A random error or variance in a measured variable.

Data mining functionality: Characterization - Answer-A summarization of the general
characteristics or features of a target class of data. (ex. the characteristics of
students can be produced, generating a profile of all the University first year
computing science students, which may include such information as a high GPA and
large number of courses taken.)

Data mining functionality: Discrimination - Answer-a comparison of the general
features of target class data objects with the general features of objects from one or
a set of contrasting classes. (ex. the general features of students with high GPA's
may be compared with the general features of students with low GPA's. The
resulting description could be a general comparative profile of the students such as
75% of the students with high GPA's are fourth-year computing science students
while 65% of the students with low GPA's are not.)

Pattern Evaluation - Answer-To identify the truly interesting patterns representing
knowledge based on interestingness measures.

knowledge presentation - Answer-where visualization and knowledge representation
techniques are used to present mined knowledge to users.

5-Number summary - Answer-Consists of the following: Minimum, Quartile 1 (Q1),
Median, Quartile 3 (Q3) and Max.

Data sets with one, two, or three modes are respectively called: - Answer-Uni-modal,
Bi-modal, and Tri-modal.

How is the Interquartile Range calculated? - Answer-Quartile 3 minus Quartile 1
(IQR = Q3 - Q1).

What are the primary factors that comprise data quality? - Answer-Accuracy,
completeness, consistency, timeliness,
believability, and interpretability

Data quality - Accuracy - Answer-Inaccurate, incomplete, and inconsistent data. Can
be caused by faulty instruments during data recording, human or computer error, or
user entered disguised missing data (intentional inaccurately entered data)

Data quality - Completeness - Answer-Missing data. Can be caused due to data that
is unavailable. Also can be caused by neglect to record data if it was not considered
useful at the time of recording, equipment malfunctions, etc.

Data quality - timeliness - Answer-The process in which data is recorded consistently
can impact the quality of the data. For example, imagine sales representatives
submitting sales records at different intervals which causes inaccuracy in data to
determine sales bonuses for top performing Sales rep. employees.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
biggdreamer Havard School
View profile
Follow You need to be logged in order to follow users or courses
Sold
247
Member since
2 year
Number of followers
68
Documents
17943
Last sold
1 week ago

4.0

38 reviews

5
22
4
4
3
6
2
2
1
4

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions