RM - Unit 120 - Simple Linear Regression
Book: Analysing Data Using Linear Models
Chapter 4: 4.3, 4.4, 4.5, 4.6, 4.7
Chapter 4.3: Linear regression
Prediction model - a reasonable approximation of the data.
linear regression - finding a straight line to approximate the data points. We describe the behaviour of
the dependent variable (the Y-variable on the vertical axis) on the basis of the independent variable (the
X-value on the horizontal axis) using a linear equation.
We say that we regress variable Y on variable X.
Chapter 4.4: Residuals
Residuals or errors - the discrepancy between, in this example, the actual amount spent (the actual
value) and the amount prescribed by the linear equation
809-786=23 euros. 809 is the amount of money spent on holidays. 786 being the amount of
money spent on holidays according to the linear equation. 23 being the residual.
Chapter 4.5: Least squares regression lines
Ordinary Least Squares (OLS) - we draw a line that is more or less in the middle of the Y-values.
Two criteria: we want the sum of the residuals to be 0 (about half of them negative, and half of
them positive), and we want the residuals to be as small as possible.
We can achieve both of these when we use as our criterion the idea that the sum of the squared
residuals be as small as possible. This way the variance of the residuals is as small as possible.
1. Numerical search Try some reasonable combinations of values for the intercept and slope, and
for each combination, calculate the sum of the squared residuals. For the combination that shows
the lowest value, try to tweak the values of the intercept and slope
a bit to find even lower values for the sum of the squared
residuals. Use some stopping rule otherwise you keep looking
forever.
2. Analytical approach For problems that are not too complex, like
this linear regression problem, there are simple mathematical
Book: Analysing Data Using Linear Models
Chapter 4: 4.3, 4.4, 4.5, 4.6, 4.7
Chapter 4.3: Linear regression
Prediction model - a reasonable approximation of the data.
linear regression - finding a straight line to approximate the data points. We describe the behaviour of
the dependent variable (the Y-variable on the vertical axis) on the basis of the independent variable (the
X-value on the horizontal axis) using a linear equation.
We say that we regress variable Y on variable X.
Chapter 4.4: Residuals
Residuals or errors - the discrepancy between, in this example, the actual amount spent (the actual
value) and the amount prescribed by the linear equation
809-786=23 euros. 809 is the amount of money spent on holidays. 786 being the amount of
money spent on holidays according to the linear equation. 23 being the residual.
Chapter 4.5: Least squares regression lines
Ordinary Least Squares (OLS) - we draw a line that is more or less in the middle of the Y-values.
Two criteria: we want the sum of the residuals to be 0 (about half of them negative, and half of
them positive), and we want the residuals to be as small as possible.
We can achieve both of these when we use as our criterion the idea that the sum of the squared
residuals be as small as possible. This way the variance of the residuals is as small as possible.
1. Numerical search Try some reasonable combinations of values for the intercept and slope, and
for each combination, calculate the sum of the squared residuals. For the combination that shows
the lowest value, try to tweak the values of the intercept and slope
a bit to find even lower values for the sum of the squared
residuals. Use some stopping rule otherwise you keep looking
forever.
2. Analytical approach For problems that are not too complex, like
this linear regression problem, there are simple mathematical