Regression: A Gentle Introduction to Residual
Analysis – Checking for Violations and Remedies
(Compiled by Emmanuel Makotsi-BSc Mathematics & Statistics)
( ; +27621456161)
Assumption 1: Correct functional Form Assumption: Linear relationship
The model specified from the given data may be correct or incorrect.
In linear regression, the mean of the response yi is a linear combination of the
predictors x1i , · · · , xk1.
In simple words, the relationship between the predictor and
dependent(outcome/response) variables must be linear.
The linearity assumption can be tested with scatter plots or the Multiple R but these
are not sufficient as a quadratic relationship may appear linear on a scatter plot with
a high correlation. A more robust way to check for any violations is to use residuals.
If the functional form of a regression model is incorrect, the residual plots
constructed by using the model often display a pattern suggesting the form of a
more appropriate model. For instance, if we use a simple linear regression model
when the true relationship between y and x is curved, the residual plot will have a
curved appearance.
We illustrate the above with the output below.
The table below shows a bivariate sample data, that is ordered pairs of two variables
(x and y).
Note that y is the response/outcome variable and x is the predictor.
, The outputs below show the scatter plot indicating the relationship between the two
variables and the Pearson’s correlation coefficient is also shown with other sample
statistics.
Model Fit Measures
Model R R²
1 0.981 0.963
Scatterplot
At first glance, both the scatter plot and the Pearson’s correlation coefficient suggest a strong linear
relationship, but looking at the residuals plot below shows that a quadratic model will fit the data more
appropriately than a linear model.
Residuals plot