QUESTIONS WITH CORRECT
SOLUTIONS
We should include as many variables as possible in our models. - ANSWER-False
Which of the following could be caused by too many variables? - ANSWER-Missing
value problems
Increased variance of predictions
Unstable estimates
Comparing to R-squared, Adjusted R-squared makes adjustment based on the number
of variables. - ANSWER-True
Which of the following is a step (are steps) of Exhaustive search? - ANSWER-From all
candidate models, choose the one with the best Adjusted R-squared.
Identify all possible models for a certain number of variables
Find the best model among all possible ones as a candidate model for a certain number
of variables
When applying which.max() to Exhaustive search regression results of Adjusted R-
squared, it returns - ANSWER-The model with largest Adjusted R-squared.
The dependent variable of logistic regression is binary. - ANSWER-True
Customers have different intentions about buying our products. Out of 900, 450
customers are willing to buy and the others do not. What is the entropy of this customer
sample? - ANSWER-1
We have the following decision tree. The splitting rules are indicated on top of a node.
What is the classification of the leftmost leaf? - ANSWER-*follow the chart
Which of the following can be used as stopping criterion for splitting? - ANSWER-When
maximum purity is obtained
When additional splits obtain no information gain
All of the other three
When the tree reaches the specified number of nodes or level of depth
All of the other three
Pruning a tree helps to address the overfitting problem. - ANSWER-True