ISYE6414 REGRESSION MIDTERM LATEST EXAM / ISYE6414
REGRESSION LATEST EXAM
What is cooks distance used for? - ANSWER: It measures how much all of the values
in the regression model change with the ith observation is removed. Basically its a
test for outliers
Rule of thumb: D denotes cooks distance, if D is > 4/n
OR D > 1 or any large D then it may be an outlier and should be removed.
If the normality assumption does not hold, we can pursue a transformation in the
response variable. T/F - ANSWER: True
If the linearity assumption does not hold, we can pursue a transformation in the
response variable. T/F - ANSWER: False, we pursue a transformation in the predictor
variables.
R^2 will always increase if we add more predicting variables. T/F - ANSWER: True
If we want to compare models with different numbers of predicting variables, what
statistic should we use? - ANSWER: Adjusted R^2 because it adjusts for the number
of predicting variables. It doesn't increase when we add more predicting variables.
A statistic that effectively summarizes how well the X's are linearly related to Y is the
correlation coefficient. T/F - ANSWER: True
T/F - The correlation coefficient cannot be used to evaluate the correlation between
the predicting variables for detecting (near) linear dependence among the variables
(or multicolinearity) - ANSWER: False, it CAN
How do you diagnose multicolinearity? - ANSWER: Calculate the VIF (variance
inflation factor) for each predicting variable
VIF = 1 / (1 - R^2j)
If VIF < max(10, 1 / (1- R^2)) then we got a problem
If a variable is correlated but does not have multicolinearity, is this a problem? -
ANSWER: Not necessarily bruh
What does the VIF measure - ANSWER: the VIF measures the proportional increase
in the variance of beta hat compared to what it would have been if the predicting
variables had been completely uncorrelated.
, True/False: The response variable in logistic regression is a binary response? -
ANSWER: True
True/False: In logistic regression, we model the probability of a success given the
predicting variables, not the response itself. - ANSWER: True
What are the assumptions for logistic regression? - ANSWER: Linearity Assumption
Independence Assumption
The G-Link function is a logit function Assumption
What is the logit function? - ANSWER: ratio between the probability of success over
probability of a failure. So basically ratio between log of P over 1-p
What is the interpretation of the logistic regression coefficient? - ANSWER: The log of
the odds ratio for an increase of one unit in the predicting variable. We do not
interpret beta with respect to the response variable but with respect to the odds of
success.
How many regression coefficients are there for logistic regression? - ANSWER: Since
there is no error time, you have P + 1 with intercept.
Logistic regression is different from standard linear regression in that
a) It does not have an error term
b) The response variable is not normally distributed.
c) It models probability of a response and not the expectation of the response.
d) All of the above. - ANSWER: d) all of the above
Which one is correct?
a) The logit link function is the only link function that can be used for modeling
binary response data.
b) Logistic regression models the probability of a success given a set of predicting
variables.
c) The interpretation of the regression coefficients in logistic regression is the same
as for standard linear regression assuming normality.
d) None of the above. - ANSWER: b) Logistic regression models the probability of a
success given a set of predicting variables.
In logistic regression,
a) The estimation of the regression coefficients is based on maximum likelihood
estimation.
b) We can derive exact (close form expression) estimates for the regression
coefficients.
c) The estimations of the regression coefficients is based on minimizing the sum of
least squares.
d) All of the above. - ANSWER: a) The estimation of the regression coefficients is
based on maximum likelihood estimation.
REGRESSION LATEST EXAM
What is cooks distance used for? - ANSWER: It measures how much all of the values
in the regression model change with the ith observation is removed. Basically its a
test for outliers
Rule of thumb: D denotes cooks distance, if D is > 4/n
OR D > 1 or any large D then it may be an outlier and should be removed.
If the normality assumption does not hold, we can pursue a transformation in the
response variable. T/F - ANSWER: True
If the linearity assumption does not hold, we can pursue a transformation in the
response variable. T/F - ANSWER: False, we pursue a transformation in the predictor
variables.
R^2 will always increase if we add more predicting variables. T/F - ANSWER: True
If we want to compare models with different numbers of predicting variables, what
statistic should we use? - ANSWER: Adjusted R^2 because it adjusts for the number
of predicting variables. It doesn't increase when we add more predicting variables.
A statistic that effectively summarizes how well the X's are linearly related to Y is the
correlation coefficient. T/F - ANSWER: True
T/F - The correlation coefficient cannot be used to evaluate the correlation between
the predicting variables for detecting (near) linear dependence among the variables
(or multicolinearity) - ANSWER: False, it CAN
How do you diagnose multicolinearity? - ANSWER: Calculate the VIF (variance
inflation factor) for each predicting variable
VIF = 1 / (1 - R^2j)
If VIF < max(10, 1 / (1- R^2)) then we got a problem
If a variable is correlated but does not have multicolinearity, is this a problem? -
ANSWER: Not necessarily bruh
What does the VIF measure - ANSWER: the VIF measures the proportional increase
in the variance of beta hat compared to what it would have been if the predicting
variables had been completely uncorrelated.
, True/False: The response variable in logistic regression is a binary response? -
ANSWER: True
True/False: In logistic regression, we model the probability of a success given the
predicting variables, not the response itself. - ANSWER: True
What are the assumptions for logistic regression? - ANSWER: Linearity Assumption
Independence Assumption
The G-Link function is a logit function Assumption
What is the logit function? - ANSWER: ratio between the probability of success over
probability of a failure. So basically ratio between log of P over 1-p
What is the interpretation of the logistic regression coefficient? - ANSWER: The log of
the odds ratio for an increase of one unit in the predicting variable. We do not
interpret beta with respect to the response variable but with respect to the odds of
success.
How many regression coefficients are there for logistic regression? - ANSWER: Since
there is no error time, you have P + 1 with intercept.
Logistic regression is different from standard linear regression in that
a) It does not have an error term
b) The response variable is not normally distributed.
c) It models probability of a response and not the expectation of the response.
d) All of the above. - ANSWER: d) all of the above
Which one is correct?
a) The logit link function is the only link function that can be used for modeling
binary response data.
b) Logistic regression models the probability of a success given a set of predicting
variables.
c) The interpretation of the regression coefficients in logistic regression is the same
as for standard linear regression assuming normality.
d) None of the above. - ANSWER: b) Logistic regression models the probability of a
success given a set of predicting variables.
In logistic regression,
a) The estimation of the regression coefficients is based on maximum likelihood
estimation.
b) We can derive exact (close form expression) estimates for the regression
coefficients.
c) The estimations of the regression coefficients is based on minimizing the sum of
least squares.
d) All of the above. - ANSWER: a) The estimation of the regression coefficients is
based on maximum likelihood estimation.