Exam Question with Solved Solutions.
What is cooks distance used for? - Answer It measures how much all of the values in the
regression model change with the ith observation is removed. Basically its a test for outliers
Rule of thumb: D denotes cooks distance, if D is > 4/n
OR D > 1 or any large D then it may be an outlier and should be removed.
If the normality assumption does not hold, we can pursue a transformation in the response
variable. T/F - Answer True
If the linearity assumption does not hold, we can pursue a transformation in the response
variable. T/F - Answer False, we pursue a transformation in the predictor variables.
R^2 will always increase if we add more predicting variables. T/F - Answer True
If we want to compare models with different numbers of predicting variables, what statistic
should we use? - Answer Adjusted R^2 because it adjusts for the number of predicting
variables. It doesn't increase when we add more predicting variables.
A statistic that effectively summarizes how well the X's are linearly related to Y is the correlation
coefficient. T/F - Answer True
T/F - The correlation coefficient cannot be used to evaluate the correlation between the
predicting variables for detecting (near) linear dependence among the variables (or
multicolinearity) - Answer False, it CAN
How do you diagnose multicolinearity? - Answer Calculate the VIF (variance inflation factor)
for each predicting variable
VIF = 1 / (1 - R^2j)
If VIF < max(10, 1 / (1- R^2)) then we got a problem
, What does the VIF measure - Answer the VIF measures the proportional increase in the
variance of beta hat compared to what it would have been if the predicting variables had been
completely uncorrelated.
True/False: The response variable in logistic regression is a binary response? - Answer True
True/False: In logistic regression, we model the probability of a success given the predicting
variables, not the response itself. - Answer True
What are the assumptions for logistic regression? - Answer Linearity Assumption
Independence Assumption
The G-Link function is a logit function Assumption
What is the logit function? - Answer ratio between the probability of success over probability
of a failure. So basically ratio between log of P over 1-p
What is the interpretation of the logistic regression coefficient? - Answer The log of the odds
ratio for an increase of one unit in the predicting variable. We do not interpret beta with respect
to the response variable but with respect to the odds of success.
How many regression coefficients are there for logistic regression? - Answer Since there is no
error time, you have P + 1 with intercept.
Logistic regression is different from standard linear regression in that
a) It does not have an error term
b) The response variable is not normally distributed.
c) It models probability of a response and not the expectation of the response.
d) All of the above. - Answer d) all of the above
Which one is correct?
a) The logit link function is the only link function that can be used for modeling binary response
data.
b) Logistic regression models the probability of a success given a set of predicting variables.
c) The interpretation of the regression coefficients in logistic regression is the same as for