DATA MINING FINAL STUDY GUIDE
TEST QUESTIONS AND ANSWERS
The ___ OLAP operation performs aggregation on a data cube, either by climbing up
a concept hierarchy for a dimension or by dimension reduction.
-roll-up
-drill-down
-slice
-rotate - Answer--roll-up
Among the data warehouse applications, ___ applications supports knowledge
discovery.
-information processing
-analytical processing
-data mining
-star schema - Answer--data mining
An advantage of the spiral method to design and construct a data warehouse is that
___.
-the process moves onto each phase only if the previous phase is complete
-it requires fewer resources
-risks are managed late in the process
-modifications can be done quickly - Answer--modifications can be done quickly
How many cuboids are there in a 6-dimensional data cube if there were no
hierarchies associated to any dimension? - Answer-64
It is often useful to summarize data by replacing low-level values with higher concept
elements, for example, and instructor may prefer to view students' grades
summarized by grade groups, say, "passing" and "not passing". That process is
known as ___.
-data generalization
-full materialization
-curse of dimensionality
-partial materialization - Answer--data generalization
The ___ method starts by compressing the database representing frequent itemsets
into a frequent pattern tree, then divides the compressed database into a set of
conditional databases.
-Apriori
-FP-growth
-Eclat
-Transaction reduction - Answer--FP-growth
Let I = {I1, I2, I3, ..., Ii} be a set of items, where Ii denotes an item ID. Consider the
transaction database D, defined in the table below:
TRANSACTION ID | LIST OF ITEMS IN THE TRANSACTION
, T1 | I2, I3, I4
T2 | I2, I5
T3 | I1, I3, I5, I6
T4 | I2, I3, I4, I7, I8
(a) Determine the support and the confidence of the association rule A ⇒ B, where
A = {I4}, B = {I7}
(b) Is the rule A ⇒ B given above strong if min_sup = 30% and min_conf = 50%? -
Answer-(a) support = 0.25, confidence = 0.5
(b) No
Clustering is also known as ___.
-outlier detection
-unsupervised learning
-backpropagation
-boosting - Answer--unsupervised learning
Consider the set of points
p1 = (0.3, 48.17)
p2 = (49.47, 25.09)
p3 = (1.47, 0.21)
p4 = (1.14, 0.85)
p5 = (48.31, 38.19)
p6 = (26.03, 14.43)
p7 = (6.65, 49.45)
p8 = (22.48, 34.88)
Suppose you would like to partition the set into 3 clusters. 1, 2 and 3, using the k-
means algorithm. If the points p1, p2, p3 are chosen as an initial cluster mean (that
is, Pl is the center of the cluster 1, p2 is the center of cluster 2, and p3 is the center
of cluster 3), which cluster will the point p4 be assigned to after the first iteration of
the k-means algorithm is completed?
Enter 1 if the point should be assigned to cluster 1, 2 if the point should be assigned
to cluster 2, and 3 if the point should be assigned to cluster 3) - Answer-3
In the ___ step, a classifier predicts categorical labels for input data.
-construction
-learning
-classification
-training - Answer--classification
When the labels of training tuples are not given in the first step of the data
classification process the step is known as ___.
-supervised learning
-unsupervised learning
-overfit
TEST QUESTIONS AND ANSWERS
The ___ OLAP operation performs aggregation on a data cube, either by climbing up
a concept hierarchy for a dimension or by dimension reduction.
-roll-up
-drill-down
-slice
-rotate - Answer--roll-up
Among the data warehouse applications, ___ applications supports knowledge
discovery.
-information processing
-analytical processing
-data mining
-star schema - Answer--data mining
An advantage of the spiral method to design and construct a data warehouse is that
___.
-the process moves onto each phase only if the previous phase is complete
-it requires fewer resources
-risks are managed late in the process
-modifications can be done quickly - Answer--modifications can be done quickly
How many cuboids are there in a 6-dimensional data cube if there were no
hierarchies associated to any dimension? - Answer-64
It is often useful to summarize data by replacing low-level values with higher concept
elements, for example, and instructor may prefer to view students' grades
summarized by grade groups, say, "passing" and "not passing". That process is
known as ___.
-data generalization
-full materialization
-curse of dimensionality
-partial materialization - Answer--data generalization
The ___ method starts by compressing the database representing frequent itemsets
into a frequent pattern tree, then divides the compressed database into a set of
conditional databases.
-Apriori
-FP-growth
-Eclat
-Transaction reduction - Answer--FP-growth
Let I = {I1, I2, I3, ..., Ii} be a set of items, where Ii denotes an item ID. Consider the
transaction database D, defined in the table below:
TRANSACTION ID | LIST OF ITEMS IN THE TRANSACTION
, T1 | I2, I3, I4
T2 | I2, I5
T3 | I1, I3, I5, I6
T4 | I2, I3, I4, I7, I8
(a) Determine the support and the confidence of the association rule A ⇒ B, where
A = {I4}, B = {I7}
(b) Is the rule A ⇒ B given above strong if min_sup = 30% and min_conf = 50%? -
Answer-(a) support = 0.25, confidence = 0.5
(b) No
Clustering is also known as ___.
-outlier detection
-unsupervised learning
-backpropagation
-boosting - Answer--unsupervised learning
Consider the set of points
p1 = (0.3, 48.17)
p2 = (49.47, 25.09)
p3 = (1.47, 0.21)
p4 = (1.14, 0.85)
p5 = (48.31, 38.19)
p6 = (26.03, 14.43)
p7 = (6.65, 49.45)
p8 = (22.48, 34.88)
Suppose you would like to partition the set into 3 clusters. 1, 2 and 3, using the k-
means algorithm. If the points p1, p2, p3 are chosen as an initial cluster mean (that
is, Pl is the center of the cluster 1, p2 is the center of cluster 2, and p3 is the center
of cluster 3), which cluster will the point p4 be assigned to after the first iteration of
the k-means algorithm is completed?
Enter 1 if the point should be assigned to cluster 1, 2 if the point should be assigned
to cluster 2, and 3 if the point should be assigned to cluster 3) - Answer-3
In the ___ step, a classifier predicts categorical labels for input data.
-construction
-learning
-classification
-training - Answer--classification
When the labels of training tuples are not given in the first step of the data
classification process the step is known as ___.
-supervised learning
-unsupervised learning
-overfit