ISYE-6414 | MODULE 3 EXAM
REVIEW QUESTIONS WITH
COMPLETE ANSWERS
What are Pearson residuals in logistic regression? - Answer-3.4) Pearson residuals are
the standardized differences between the observed response and the estimated
expected response, which is n_i times the probability of success p_i^hat for the i-th
observation.
Why do we need to standardize the difference between observed and expected
responses when calculating Pearson residuals? - Answer-3.4) We standardize the
difference because the responses have different variances.
What are deviance residuals in logistic regression? - Answer-3.4) Deviance residuals
are the signed square root of the difference between the log-likelihood of the saturated
model and the log-likelihood of the fitted model.
What is a saturated model? - Answer-3.4) A saturated model assumes that the
expected probability of success is simply the observed response Y_i divided by n_i,
making it unconditional on any predicting variable. No logistic regression or other model
fitting is required.
What is the relationship between deviance residuals in logistic regression and residuals
in standard linear regression? - Answer-3.4) Deviance residuals in logistic regression
are the equivalent of residuals in standard linear regression.
What approximate distribution do Pearson residuals follow, and why? - Answer-3.4)
Pearson residuals follow an approximately standard normal distribution due to the
binomial approximation with a normal distribution using the central limit theorem.
What approximate distribution do deviance residuals follow if model assumptions hold?
- Answer-3.4) Deviance residuals have an approximately standard normal distribution if
the model assumptions hold and the model is a good fit.
How can we evaluate whether a logistic regression model is a good fit? - Answer-3.4)
We can evaluate the model fit by checking whether the Pearson or deviance residuals
are normally distributed, using histograms and normality plots.
What are the null and alternative hypotheses (H0 and Ha) in the context of logistic
model goodness of fit? - Answer-3.4) H0: The logistic model fits the data.
Ha: The logistic model does not fit the data.
,What is the test statistic for evaluating logistic model goodness of fit, and what is its
distribution under the null hypothesis? - Answer-3.4) The test statistic is D=∑{i=1}^{n}
d_i^2, where d_i are the deviance residuals. Under H0, D follows a chi-squared
distribution with n−p−1 degrees of freedom.
How do you determine whether to reject the null hypothesis in logistic model goodness
of fit? - Answer-3.4) We reject H0 if the p-value is small, meaning the probability
Pr(χdf^2 > D) is small. For a good fit, we want large p-values.
What visual methods can be used to assess the goodness of fit in a logistic model? -
Answer-3.4) You can use normal probability plots and histograms of the residuals to
evaluate if the residuals are normally distributed, indicating a good model fit.
True or False: To evaluate whether the model is a good fit (or whether the assumptions
of logistic regression hold) we can use the Pearson or deviance residuals. - Answer-3.4)
True. We can use the Pearson or deviance residuals to evaluate whether they are
normally distributed using the histogram and the normality plots. If they're normally
distributed, then we conclude that the model is a good fit.
True or False: Another approach to evaluating goodness of fit is through hypothesis
testing. - Answer-3.4) True. In the goodness of fit test, the null hypothesis is that the
model fits well, and the alternative is that the model does not fit well.
What are the test statistics for the null hypothesis for the goodness of fit test? - Answer-
3.4) Under the null hypothesis the test statistic has an approximate chi-square
distribution with n-p-2 degrees of freedom.
True or False: We reject the null hypothesis for the goodness of fit if the p-value is large.
- Answer-3.4) False. If the p-value is small we reject the null hypothesis of good fit,
concluding that the model is NOT a good fit. This is the only time we want large p-
values.
What do large p-values indicate in a goodness of fit hypothesis test? - Answer-3.4)
Large p-values indicate that the model is a plausibly good fit.
What is the test statistic in general for the goodness of fit test? - Answer-3.4) The test
statistic for the goodness of fit test is the sum of squared deviances or deviance
residuals.
What is the difference between a test for a subset of regression coefficients and a
goodness of fit test? - Answer-3.4) A test for a subset of regression coefficients
compares the likelihood of a reduced model versus a full model, assessing the
predictive power of the model.
, A goodness of fit test compares the likelihoods of the saturated model versus the fitted
model, assessing whether the model assumptions hold.
What is the key difference in inference between the test for a subset of coefficients and
the goodness of fit test? - Answer-3.4) The subset test provides inferences on the
predictive power of the model, even if assumptions don't hold.
The goodness of fit test provides inferences about whether the model assumptions,
such as the S-shaped logistic function or linear relationships with predictors, hold true.
True or False: The difference in log-likelihood between two models is used as the test
statistic for both testing subsets of regression coefficients and goodness of fit. - Answer-
3.4) True
What does the goodness-of-fit test compare in logistic regression? - Answer-3.4) It
compares the likelihoods of the saturated model versus the fitted model.
True or False: Predictive power implies that the model assumptions hold well. - Answer-
3.4) False. Predictive power means the predicting variables can predict the data even if
some model assumptions don't hold.
Why might the logistic regression model not be a good fit for some binary data? -
Answer-3.4) The relationship might be non-linear, or key variables may be missing,
among other reasons.
What can overdispersion indicate in logistic regression? - Answer-3.4) Overdispersion
can indicate that the binomial distribution is not appropriate, possibly due to correlation
among responses or unmodeled heterogeneity.
What is a possible solution if a predictor has a long right tail in logistic regression? -
Answer-3.4) Apply a log transformation to the predictor for a better fit.
True or False: A model that fits well always has good predictive power. - Answer-3.4)
False. Good fit does not always mean good predictive power.
What is the canonical link function in logistic regression? - Answer-3.4) The logit
function.
Name two alternative link functions to logit for modeling binomial response data. -
Answer-3.4) Probit and complementary log-log functions.
What advantage does the logit function have over other S-shaped functions in logistic
regression? - Answer-3.4) It provides ease of interpretation in terms of log odds and is
fully efficient for parameter estimation.
REVIEW QUESTIONS WITH
COMPLETE ANSWERS
What are Pearson residuals in logistic regression? - Answer-3.4) Pearson residuals are
the standardized differences between the observed response and the estimated
expected response, which is n_i times the probability of success p_i^hat for the i-th
observation.
Why do we need to standardize the difference between observed and expected
responses when calculating Pearson residuals? - Answer-3.4) We standardize the
difference because the responses have different variances.
What are deviance residuals in logistic regression? - Answer-3.4) Deviance residuals
are the signed square root of the difference between the log-likelihood of the saturated
model and the log-likelihood of the fitted model.
What is a saturated model? - Answer-3.4) A saturated model assumes that the
expected probability of success is simply the observed response Y_i divided by n_i,
making it unconditional on any predicting variable. No logistic regression or other model
fitting is required.
What is the relationship between deviance residuals in logistic regression and residuals
in standard linear regression? - Answer-3.4) Deviance residuals in logistic regression
are the equivalent of residuals in standard linear regression.
What approximate distribution do Pearson residuals follow, and why? - Answer-3.4)
Pearson residuals follow an approximately standard normal distribution due to the
binomial approximation with a normal distribution using the central limit theorem.
What approximate distribution do deviance residuals follow if model assumptions hold?
- Answer-3.4) Deviance residuals have an approximately standard normal distribution if
the model assumptions hold and the model is a good fit.
How can we evaluate whether a logistic regression model is a good fit? - Answer-3.4)
We can evaluate the model fit by checking whether the Pearson or deviance residuals
are normally distributed, using histograms and normality plots.
What are the null and alternative hypotheses (H0 and Ha) in the context of logistic
model goodness of fit? - Answer-3.4) H0: The logistic model fits the data.
Ha: The logistic model does not fit the data.
,What is the test statistic for evaluating logistic model goodness of fit, and what is its
distribution under the null hypothesis? - Answer-3.4) The test statistic is D=∑{i=1}^{n}
d_i^2, where d_i are the deviance residuals. Under H0, D follows a chi-squared
distribution with n−p−1 degrees of freedom.
How do you determine whether to reject the null hypothesis in logistic model goodness
of fit? - Answer-3.4) We reject H0 if the p-value is small, meaning the probability
Pr(χdf^2 > D) is small. For a good fit, we want large p-values.
What visual methods can be used to assess the goodness of fit in a logistic model? -
Answer-3.4) You can use normal probability plots and histograms of the residuals to
evaluate if the residuals are normally distributed, indicating a good model fit.
True or False: To evaluate whether the model is a good fit (or whether the assumptions
of logistic regression hold) we can use the Pearson or deviance residuals. - Answer-3.4)
True. We can use the Pearson or deviance residuals to evaluate whether they are
normally distributed using the histogram and the normality plots. If they're normally
distributed, then we conclude that the model is a good fit.
True or False: Another approach to evaluating goodness of fit is through hypothesis
testing. - Answer-3.4) True. In the goodness of fit test, the null hypothesis is that the
model fits well, and the alternative is that the model does not fit well.
What are the test statistics for the null hypothesis for the goodness of fit test? - Answer-
3.4) Under the null hypothesis the test statistic has an approximate chi-square
distribution with n-p-2 degrees of freedom.
True or False: We reject the null hypothesis for the goodness of fit if the p-value is large.
- Answer-3.4) False. If the p-value is small we reject the null hypothesis of good fit,
concluding that the model is NOT a good fit. This is the only time we want large p-
values.
What do large p-values indicate in a goodness of fit hypothesis test? - Answer-3.4)
Large p-values indicate that the model is a plausibly good fit.
What is the test statistic in general for the goodness of fit test? - Answer-3.4) The test
statistic for the goodness of fit test is the sum of squared deviances or deviance
residuals.
What is the difference between a test for a subset of regression coefficients and a
goodness of fit test? - Answer-3.4) A test for a subset of regression coefficients
compares the likelihood of a reduced model versus a full model, assessing the
predictive power of the model.
, A goodness of fit test compares the likelihoods of the saturated model versus the fitted
model, assessing whether the model assumptions hold.
What is the key difference in inference between the test for a subset of coefficients and
the goodness of fit test? - Answer-3.4) The subset test provides inferences on the
predictive power of the model, even if assumptions don't hold.
The goodness of fit test provides inferences about whether the model assumptions,
such as the S-shaped logistic function or linear relationships with predictors, hold true.
True or False: The difference in log-likelihood between two models is used as the test
statistic for both testing subsets of regression coefficients and goodness of fit. - Answer-
3.4) True
What does the goodness-of-fit test compare in logistic regression? - Answer-3.4) It
compares the likelihoods of the saturated model versus the fitted model.
True or False: Predictive power implies that the model assumptions hold well. - Answer-
3.4) False. Predictive power means the predicting variables can predict the data even if
some model assumptions don't hold.
Why might the logistic regression model not be a good fit for some binary data? -
Answer-3.4) The relationship might be non-linear, or key variables may be missing,
among other reasons.
What can overdispersion indicate in logistic regression? - Answer-3.4) Overdispersion
can indicate that the binomial distribution is not appropriate, possibly due to correlation
among responses or unmodeled heterogeneity.
What is a possible solution if a predictor has a long right tail in logistic regression? -
Answer-3.4) Apply a log transformation to the predictor for a better fit.
True or False: A model that fits well always has good predictive power. - Answer-3.4)
False. Good fit does not always mean good predictive power.
What is the canonical link function in logistic regression? - Answer-3.4) The logit
function.
Name two alternative link functions to logit for modeling binomial response data. -
Answer-3.4) Probit and complementary log-log functions.
What advantage does the logit function have over other S-shaped functions in logistic
regression? - Answer-3.4) It provides ease of interpretation in terms of log odds and is
fully efficient for parameter estimation.