CORRECT ANSWERS
T/F When a model is over-fitted the regression coefficients represent noise in the data, rather than
the genuine relationships in the population. ✅✅CORRECT ANSW-True
You are building a multiple linear regression model to predict median house price (MEDV) in Boston
using a data set with 12 predictors as shown in the following correlation matrix. Based on the matrix,
you would expect the violation of the multicollinearity assumption to happen between what
variables?
Hint: multicollinearity means a strong linear relationship between two predictors (independent
variables). ✅✅CORRECT ANSW-Answer is TAX & RAD. Look for darkest spot regardless of pos or
neg number. Highest value is strongest correlation
The following linear model is developed on the normalized data to predict used car prices. Which of
the predictors has the LARGEST effect on the predicted price?
Price=-.17xage + .37xFuelTypeD + .27xFuelTypeG +.40xhps -.63xccs +.24taxes ✅✅CORRECT
ANSW-CC has the largest coefficient
T/F In the search for the best set of variables for the linear regression model, when the number of
potential predictors is small, the exhaustive search method gives significantly different and better
results than other methods. ✅✅CORRECT ANSW-False
Given the following linear regression model for predicting House Prices, which variable has the
largest effect on the predicted price?
Price=45098-163.5xAge +767.9x Room -34.7x Crime + 451.2x Lot Size
where: Age is the age of the house, Room is the number of rooms, CRIME is the crime rate per capita
in the town, and Lot Size is the lot size (sqf) of the house ✅✅CORRECT ANSW-Usually Room
would be answer but model is not normalized so cannot be determined
Which of the following variable search methods for the linear regression model examines all possible
combinations of variables? ✅✅CORRECT ANSW-Exhaustive search
, T/F In the standardized linear regression model, normalized predictors don't have the same unit and
scale as the original predictors. ✅✅CORRECT ANSW-True
We have developed a linear regression model and the residual plots are shown in the following
figure. What statement is CORRECT about the model? ✅✅CORRECT ANSW-Model is violating the
linearity assumption
T/F A linear regression model that is developed on the original data, can be used to compare the
effect of predictors on the predicted target variable. ✅✅CORRECT ANSW-False
Which of the followings is NOT a strategy to prevent model over-fitting?
Set a limit on the value of R2 metric
Penalizing the model for including more variables
Adding variables to the model only if they improve the model performance and goodness-of-fit
Splitting data into train and validation sets ✅✅CORRECT ANSW-Set a limit on the value of R2
metric
Fall Out Score ✅✅CORRECT ANSW-false positives /( true negs +false pos)
We have trained a classification model and it's ROC curve is shown below. Given that the Area Under
the Curve (AUC) is our performance metric. Which model is performing better? ✅✅CORRECT
ANSW-Curve A
In confusion matrix which cell is false positive ✅✅CORRECT ANSW-Cell C
Error rate (confusion matrix) ✅✅CORRECT ANSW-Error Rate= All falsely classified cases/ All cases
cell b+c/total
T/F In evaluating a predictive model with a numerical target, the mean absolute error (MAE) can be
negative or positive but the mean error (ME) is always positive. ✅✅CORRECT ANSW-False