AND CORRECT ANSWERS
With the k-NN model for classification, after we determined the k nearest neighbors of a new data
record, how the class is predicted?
-Average of the neighbors
-Through a logistic regression between the neighbors
-Majority vote determines the predicted class
-Through a linear combination of neighbors ✅✅CORRECT ANSW-majority vote determines the
predicted class
What statement is INCORRECT about the k-nearest neighbor (k-NN) method?
-k is an arbitrary number that can be selected by trial-and-error
-Different k value can change the performance of the classifier
-When k=1 (closest record) the classifier performance is maximum
-Too small value for k may lead to over-fitting ✅✅CORRECT ANSW-when k=1 (closest record) the
classifier performance is maximum
Consider two models A and B. If the prediction accuracy of Model A is higher than that of Model B
for the training dataset, we can say that Model A is definitely better than Model B. ✅✅CORRECT
ANSW-false
What is the sensitivity score of the following confusion matrix given that "1" is positive? (rounded to
2 decimal places) ✅✅CORRECT ANSW-.71
sensitivity = tp/(tp+fn)
The main difference between k-NN classifiers and k-NN regression models is that the former does
not need a distance function, while the latter uses the Euclidean distance function. ✅✅CORRECT
ANSW-False
What can cause the over-fitting problem in k-NN classifiers?
,-splitting the data set
-incorrect distance function
-too small values of k
-too large values of k ✅✅CORRECT ANSW-too small values of k
We have trained a classification model and it's ROC curve is shown below. Given that the Area Under
the Curve (AUC) is our performance metric. Which model is performing better? ✅✅CORRECT
ANSW-A
whatever line is the highest
What is propensity score?
-predicted probability of class membership
-An arbitrary number assigned to each record
-an indicator of the correct cut-off value
-a measure that shows accuracy of the model ✅✅CORRECT ANSW-predicted probability of class
membership
In evaluating a predictive model with a numerical target, the root mean squared error (RMSE) has
the same unit as the predicted variable. ✅✅CORRECT ANSW-true
In the following confusion matrix, which cell is the FALSE POSITIVE? ✅✅CORRECT ANSW-C
lower left
What is the specificity score of the following confusion matrix given that "1" is positive? (rounded to
2 places) ✅✅CORRECT ANSW-.81
specificity = tn/(tn+fp)
What is the fall-out score of the following confusion matrix given that "1" is positive? (rounded to 2
places) ✅✅CORRECT ANSW-0.47
The cost of misclassification is always the same for false negative and false positive cases.
✅✅CORRECT ANSW-false
, In evaluating a predictive model with a numerical target, the mean absolute error (MAE) can be
negative or positive but the mean error (ME) is always positive. ✅✅CORRECT ANSW-false
What is the error rate of the following confusion matrix? (rounded to 2 decimal places)
✅✅CORRECT ANSW-0.41
In the confusion matrix the term "actual" refers to the observed labels of the data. ✅✅CORRECT
ANSW-true
Maximizing which performance metric, reduced type I and II errors of classification? ✅✅CORRECT
ANSW-AUC ROC
What is the predicted variable in the logistic regression model?
-RMSE
-Probability of class membership
-Confusion matrix
-A number between -1 and 1 ✅✅CORRECT ANSW-Probability of class membership
Which statement is correct about the cutoff value of the probability calculated by a logistic
regression model to be used for classification?
-Larger cutoff values result in higher model performance
-Smaller cutoff values result in higher model performance
-The cutoff value is an arbitrary value determined by model performance assessment
-The cutoff value must always be set to 0.5 ✅✅CORRECT ANSW-The cutoff value is an arbitrary
value determined by model performance assessment
Input variables (features) of the logistic regression model cannot be categorical. ✅✅CORRECT
ANSW-true
How can we turn the logistic regression model into the classification model?