MA317 (2020/21) 4 MODELLING EXPERIMENTAL DATA
2 Lecture: Revision simple linear regression I
The example life expectancy (years) and average income (USD) of EU countries is used to revise simple linear
regression. Least square estimation and maximum-likelihood-estimation of the intercept and slope parameter are
derived. The distribution of the parameters is discussed.
2.1 Introduction
Lets assume we have a data set of length n with pairs (xi , yi ), where x is a vector containing all the xi values
(predictor variables) and y is a vector containing all the yi values (response variables) and want to fit a regression
line-of-best fit to the data e.g.
y = a + bx
Then we can use a statistical model of a simple linear regression (for example cf. Ross, 2009, pages 353-378) to find
the ‘best’ values for a and b (â and b̂).
Here, we use the model that proposes
Yi ∼ N (a + bxi , σ 2 ), i = 1, 2, . . . n,
where the random variables Y1 , Y2 , . . . , Yn are independently distributed, σ > 0, a ∈ R and b ∈ R are unknown
parameters.
NOTE:
A random variable X is said to have a normal distribution with parameters µ and σ 2 , if its pdf f is given by:
(x − µ)2
2 1
f (µ, σ ; x) = √ exp − ,
2π σ 2σ 2
x ∈ R, µ ∈ R, σ > 0.
The normal distribution with parameters µ and σ 2 is denoted by N (µ, σ 2 ).
X ∼ N (µ, σ 2 ) is short form of: A random variable X is distributed by N (µ, σ 2 ).
Therefore, our model assumes the individual response variable, yi , can be modelled by a normal distribution with
mean a + bxi and standard deviation σ.
Using this model we can calculate the least-squares estimators (LSE) as
Where the LSE minimise the sum of squared differences, defined by
, MA317 (2020/21) 5 MODELLING EXPERIMENTAL DATA
With,
Example 2.1. Life expectancy (years) and average income (USD) of EU countries (further details see example 3.2
later on).
> euyear2008
............. income (USD) lifespan (years)
Austria.......... 49525.06 80.45
Belgium.......... 47148.85 80.11
Bulgaria.......... 6546.31 73.32
Cyprus........... 31409.84 79.66
Czech Republic... 20728.85 77.21
Denmark.......... 62035.78 78.70
Estonia.......... 17541.30 73.97
Finland.......... 50775.44 79.79
France........... 44471.50 81.52
Germany.......... 44524.52 80.09
Greece........... 31173.57 79.96
Hungary.......... 15408.01 74.01
Ireland.......... 60178.22 79.86
Italy............ 38384.51 81.95
Latvia........... 14937.07 72.24
Lithuania........ 14034.31 71.82
Luxembourg...... 117954.68 80.52
Netherlands, The. 53075.91 80.40
Poland........... 13857.40 75.53
Portugal......... 22955.13 79.25
Romania........... 9299.74 73.37
Slovak Republic.. 18211.64 74.81
Slovenia......... 26910.67 78.97
Spain............ 35000.35 81.09
Sweden........... 52884.46 81.24
United Kingdom... 43360.77 79.90
We use R to compute the least square estimates of the intercept a and the slope b of a simple linear regression for the
response variable lifespan (years) y and the predictor variable average income (USD) x:
y = a + b x,
2 Lecture: Revision simple linear regression I
The example life expectancy (years) and average income (USD) of EU countries is used to revise simple linear
regression. Least square estimation and maximum-likelihood-estimation of the intercept and slope parameter are
derived. The distribution of the parameters is discussed.
2.1 Introduction
Lets assume we have a data set of length n with pairs (xi , yi ), where x is a vector containing all the xi values
(predictor variables) and y is a vector containing all the yi values (response variables) and want to fit a regression
line-of-best fit to the data e.g.
y = a + bx
Then we can use a statistical model of a simple linear regression (for example cf. Ross, 2009, pages 353-378) to find
the ‘best’ values for a and b (â and b̂).
Here, we use the model that proposes
Yi ∼ N (a + bxi , σ 2 ), i = 1, 2, . . . n,
where the random variables Y1 , Y2 , . . . , Yn are independently distributed, σ > 0, a ∈ R and b ∈ R are unknown
parameters.
NOTE:
A random variable X is said to have a normal distribution with parameters µ and σ 2 , if its pdf f is given by:
(x − µ)2
2 1
f (µ, σ ; x) = √ exp − ,
2π σ 2σ 2
x ∈ R, µ ∈ R, σ > 0.
The normal distribution with parameters µ and σ 2 is denoted by N (µ, σ 2 ).
X ∼ N (µ, σ 2 ) is short form of: A random variable X is distributed by N (µ, σ 2 ).
Therefore, our model assumes the individual response variable, yi , can be modelled by a normal distribution with
mean a + bxi and standard deviation σ.
Using this model we can calculate the least-squares estimators (LSE) as
Where the LSE minimise the sum of squared differences, defined by
, MA317 (2020/21) 5 MODELLING EXPERIMENTAL DATA
With,
Example 2.1. Life expectancy (years) and average income (USD) of EU countries (further details see example 3.2
later on).
> euyear2008
............. income (USD) lifespan (years)
Austria.......... 49525.06 80.45
Belgium.......... 47148.85 80.11
Bulgaria.......... 6546.31 73.32
Cyprus........... 31409.84 79.66
Czech Republic... 20728.85 77.21
Denmark.......... 62035.78 78.70
Estonia.......... 17541.30 73.97
Finland.......... 50775.44 79.79
France........... 44471.50 81.52
Germany.......... 44524.52 80.09
Greece........... 31173.57 79.96
Hungary.......... 15408.01 74.01
Ireland.......... 60178.22 79.86
Italy............ 38384.51 81.95
Latvia........... 14937.07 72.24
Lithuania........ 14034.31 71.82
Luxembourg...... 117954.68 80.52
Netherlands, The. 53075.91 80.40
Poland........... 13857.40 75.53
Portugal......... 22955.13 79.25
Romania........... 9299.74 73.37
Slovak Republic.. 18211.64 74.81
Slovenia......... 26910.67 78.97
Spain............ 35000.35 81.09
Sweden........... 52884.46 81.24
United Kingdom... 43360.77 79.90
We use R to compute the least square estimates of the intercept a and the slope b of a simple linear regression for the
response variable lifespan (years) y and the predictor variable average income (USD) x:
y = a + b x,