Regression diagnostics
Biometry 755
Spring 2009
Regression diagnostics – p. 1/48
Introduction
Every statistical method is developed based on assumptions.
The validity of results derived from a given method depends
on how well the model assumptions are met. Many statistical
procedures are “robust”, which means that only extreme
violations from the assumptions impair the ability to draw valid
conclusions. Linear regression falls in the category of robust
statistical methods. However, this does not relieve the
investigator from the burden of verifying that the model
assumptions are met, or at least, not grossly violated. In
addition, it is always important to demonstrate how well the
model fits the observed data, and this is assessed in part
based on the techniques we’ll learn in this lecture.
Regression diagnostics – p. 2/48
,Different types of residuals
Recall that the residuals in regression are defined as yi − ŷi ,
where yi is the observed response for the ith observation,
and ŷi is the fitted response at xi .
There are other types of residuals that will be useful in our
discussion of regression diagnostics. We define them on the
following slide.
Regression diagnostics – p. 3/48
Different types of residuals (cont.)
Raw residuals: ri = yi − ŷi
ri
Standardized residuals: zi = where s is the estimated
s √
error standard deviation (i.e. s = σ̂ = MSE).
zi
Studentized residuals: ri∗ = √1−h i
where hi is called the
leverage. (More later about the interpretation of hi .)
s
Jackknife residuals: r(−i) = ri∗ s(−i) where s(−i) is the
estimated error standard deviation computed with the ith
observation deleted.
Regression diagnostics – p. 4/48
, Which residual to use?
The standardized, studentized and jackknife residuals are all
scale independent and are therefore preferred to raw
residuals. Of these, jackknife residuals are most sensitive to
outlier detection and are superior in terms of revealing other
problems with the data. For that reason, most diagnostics rely
upon the use of jackknife residuals. Whenever we have a
choice in the residual analysis, we will select jackknife
residuals.
Regression diagnostics – p. 5/48
Analysis of residuals - Normality
Recall that an assumption of linear regression is that the error
terms are normally distributed. That is ε ∼ Normal(0, σ 2 ). To
assess this assumption, we will use the residuals to look at:
• histograms
• normal quantile-quantile (qq) plots
• Wilk-Shapiro test
Regression diagnostics – p. 6/48
Biometry 755
Spring 2009
Regression diagnostics – p. 1/48
Introduction
Every statistical method is developed based on assumptions.
The validity of results derived from a given method depends
on how well the model assumptions are met. Many statistical
procedures are “robust”, which means that only extreme
violations from the assumptions impair the ability to draw valid
conclusions. Linear regression falls in the category of robust
statistical methods. However, this does not relieve the
investigator from the burden of verifying that the model
assumptions are met, or at least, not grossly violated. In
addition, it is always important to demonstrate how well the
model fits the observed data, and this is assessed in part
based on the techniques we’ll learn in this lecture.
Regression diagnostics – p. 2/48
,Different types of residuals
Recall that the residuals in regression are defined as yi − ŷi ,
where yi is the observed response for the ith observation,
and ŷi is the fitted response at xi .
There are other types of residuals that will be useful in our
discussion of regression diagnostics. We define them on the
following slide.
Regression diagnostics – p. 3/48
Different types of residuals (cont.)
Raw residuals: ri = yi − ŷi
ri
Standardized residuals: zi = where s is the estimated
s √
error standard deviation (i.e. s = σ̂ = MSE).
zi
Studentized residuals: ri∗ = √1−h i
where hi is called the
leverage. (More later about the interpretation of hi .)
s
Jackknife residuals: r(−i) = ri∗ s(−i) where s(−i) is the
estimated error standard deviation computed with the ith
observation deleted.
Regression diagnostics – p. 4/48
, Which residual to use?
The standardized, studentized and jackknife residuals are all
scale independent and are therefore preferred to raw
residuals. Of these, jackknife residuals are most sensitive to
outlier detection and are superior in terms of revealing other
problems with the data. For that reason, most diagnostics rely
upon the use of jackknife residuals. Whenever we have a
choice in the residual analysis, we will select jackknife
residuals.
Regression diagnostics – p. 5/48
Analysis of residuals - Normality
Recall that an assumption of linear regression is that the error
terms are normally distributed. That is ε ∼ Normal(0, σ 2 ). To
assess this assumption, we will use the residuals to look at:
• histograms
• normal quantile-quantile (qq) plots
• Wilk-Shapiro test
Regression diagnostics – p. 6/48