DATA MINING PRACTICE EXAM #2
QUESTIONS AND ANSWERS
Discretization that happens before learning is... - Answer-...Local Discretization.
>>...Global Discretization.
...Normalization.
...Mojang!
In 10-fold Cross-Validation how many times will each fold be used as part of the test
set? - Answer->>1
10
3
9
Which of the following is the epsilon parameter in DBScan? - Answer-Number of
neighbors needed to determine which cluster to be added to.
>>Distance a point must be within to be part of a cluster.
Number of clusters.
Number of Minimum Points to make a cluster.
Using k-Nearest Neighbors technique, with a k = 3, in what class would the new
value Green Circle be classified in? - Answer-Cluster of Outliers
Blue Square
Green Circle
>>Red Triangle
Which of the following is used to determine the distance between Class Vectors
when using Error Correction Output Codes to perform multiclass evaluations? -
Answer-Mahalanobis Distance.
Manhattan Distance.
Eigen Vectors.
>>Hamming Distance.
The Tokenization process where you divide a text into individual words is... -
Answer-...n-grams.
>>...Bag of words.
...Stemming.
...Stopword.
When using a data mining technique that uses distance function it is a good idea to...
- Answer->>...Normalize your data.
...Discretize your data.
...use Cross-Validation.
...use reservoir sampling.
Which of the following techniques can be used to evaluate the performance of a
Numeric Prediction Model? - Answer-None of these techniques.
Confusion Matrix.
, Cross-Validation.
>>Root Mean Squared Error.
Which of the following is a parameter used in k-Means clustering? - Answer-
>>Number of clusters.
Number of Minimum Points to make a cluster.
Distance a point must be within to be part of a cluster.
Number of neighbors needed to determine which cluster to be added to.
Which of the following Attribute Space Search techniques fixes the number of
attributes allowed in the search? - Answer-Stepwise Selection.
Genetic Algorithm Search.
>>Beam Search.
Best-fit Search.
Which of the following techniques can be used to evaluate the performance of a
Clustering Model? - Answer-Cross-Validation.
Root Mean Squared Error.
Confusion Matrix.
>>None of these techniques.
The hierarchical clustering method which starts by forming 1 cluster, then recursively
evaluating the case for splitting clusters into more clusters is the ... - Answer-...
evolutionary method.
... agglomerative method.
... iterative distance method.
>>... divisive method.
Which of the following is a technique used to address dataset sizes that does not
involve Sampling? - Answer-Bootstrap
>>Leave-one-out
k-Nearest Neighbors
Cross-Validation
Which type of component analysis transforms the dataspace based on covariance? -
Answer-Random Projection
Independent Component Analysis
>>Principal Component Analysis
Correlation Coefficient
Which of the following is not a Stemming algorithm? - Answer-Lovins.
>>Tolkein.
Paice-Husk.
Snowball.
How do you determine the error rate of a model when you use Cross-Validation? -
Answer-Use the error rate that occurs with the highest frequency within all error
rates.
>>Determine the average error rate of all error rates.
Use the maximal value from all the error rates.
QUESTIONS AND ANSWERS
Discretization that happens before learning is... - Answer-...Local Discretization.
>>...Global Discretization.
...Normalization.
...Mojang!
In 10-fold Cross-Validation how many times will each fold be used as part of the test
set? - Answer->>1
10
3
9
Which of the following is the epsilon parameter in DBScan? - Answer-Number of
neighbors needed to determine which cluster to be added to.
>>Distance a point must be within to be part of a cluster.
Number of clusters.
Number of Minimum Points to make a cluster.
Using k-Nearest Neighbors technique, with a k = 3, in what class would the new
value Green Circle be classified in? - Answer-Cluster of Outliers
Blue Square
Green Circle
>>Red Triangle
Which of the following is used to determine the distance between Class Vectors
when using Error Correction Output Codes to perform multiclass evaluations? -
Answer-Mahalanobis Distance.
Manhattan Distance.
Eigen Vectors.
>>Hamming Distance.
The Tokenization process where you divide a text into individual words is... -
Answer-...n-grams.
>>...Bag of words.
...Stemming.
...Stopword.
When using a data mining technique that uses distance function it is a good idea to...
- Answer->>...Normalize your data.
...Discretize your data.
...use Cross-Validation.
...use reservoir sampling.
Which of the following techniques can be used to evaluate the performance of a
Numeric Prediction Model? - Answer-None of these techniques.
Confusion Matrix.
, Cross-Validation.
>>Root Mean Squared Error.
Which of the following is a parameter used in k-Means clustering? - Answer-
>>Number of clusters.
Number of Minimum Points to make a cluster.
Distance a point must be within to be part of a cluster.
Number of neighbors needed to determine which cluster to be added to.
Which of the following Attribute Space Search techniques fixes the number of
attributes allowed in the search? - Answer-Stepwise Selection.
Genetic Algorithm Search.
>>Beam Search.
Best-fit Search.
Which of the following techniques can be used to evaluate the performance of a
Clustering Model? - Answer-Cross-Validation.
Root Mean Squared Error.
Confusion Matrix.
>>None of these techniques.
The hierarchical clustering method which starts by forming 1 cluster, then recursively
evaluating the case for splitting clusters into more clusters is the ... - Answer-...
evolutionary method.
... agglomerative method.
... iterative distance method.
>>... divisive method.
Which of the following is a technique used to address dataset sizes that does not
involve Sampling? - Answer-Bootstrap
>>Leave-one-out
k-Nearest Neighbors
Cross-Validation
Which type of component analysis transforms the dataspace based on covariance? -
Answer-Random Projection
Independent Component Analysis
>>Principal Component Analysis
Correlation Coefficient
Which of the following is not a Stemming algorithm? - Answer-Lovins.
>>Tolkein.
Paice-Husk.
Snowball.
How do you determine the error rate of a model when you use Cross-Validation? -
Answer-Use the error rate that occurs with the highest frequency within all error
rates.
>>Determine the average error rate of all error rates.
Use the maximal value from all the error rates.