By: Thomas Konings
Lecture 1: Introduction and Recap
Basic Regression Theory
Basic equation: 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝑢𝑖
Draw line? By hand, not possible to minimize average errors → how to do this? Set α to E(yi) and β =
0 but not interesting ➔ so minimize squared errors (MSE) intuition: big errors are worse
How to estimate? ➔ First: assume α = 0 and E(yi) = E(xi) = 0
➔
Least squares estimator is:
Essentially OLS is the sample analogue to 𝛽 = 𝐶𝑜𝑣(𝑥𝑖, 𝑦𝑖)/𝑉𝑎𝑟(𝑥𝑖)
(From law of large numbers, i.e. sample means [as N increases] tend to true expected value)
Assumption 4: means that 𝐶𝑜𝑣(𝑥𝑖 , 𝑦𝑖 − 𝛽𝑥𝑖 ) = 0 (second bit is rewritten error)
Then, this is the same as 𝐶𝑜𝑣(𝑥𝑖 , 𝑦𝑖 ) − 𝛽𝐶𝑜𝑣(𝑥𝑖 , 𝑥𝑖 ) = 0
Which, solving for β is: 𝛽 = 𝐶𝑜𝑣(𝑥𝑖 , 𝑦𝑖 )/𝑉𝑎𝑟(𝑥𝑖 ) (i.e. cov of x with itself is var)
Note: this relation is exact, errors in least squares estimator come from the estimation of the
covariances (LLN not exact in finite samples)
Other assumptions mean that OLS is the Best Linear Unbiased Estimate (BLUE).
Estimator: it is not the true β
Unbiased: for infinite samples the expected value of estimated β is the true β
Linear: it’s a line
Best: OLS produces the least variance compared to any other straight line
How restrictive is linear?
(1) Very: if outcome is naturally bounded ([0,1], > 0, etc.), linear cannot limit predictions to range
𝛽 𝑥
(2) Not so much: 𝑦𝑖 = 𝛽1 𝑥𝑖 + 𝛽2 𝑥𝑖2 + ⋯ + 𝑢𝑖 , but not 𝑦𝑖 = 𝛽 1+𝑥𝑖 + 𝑢𝑖
2 𝑖
What is the distribution of estimated β? (full derivation in slides)
➔ this is a normal distribution
For small N it is still a Student t-distribution, for large N it is normal through central limit theorem
Note: the sum of Xi2 is bigger for larger N, dispersion of (𝑥𝑖 − 𝑥̅ ) → σ goes up with greater
dispersion, and the further xi is from zero, the harder it is to pin down the constant.