Omitted Relevant Variable
One case of misspecification is when we omit a relevant variable from the mode. We can observe
how this affects our coefficients by looking at a simple model.
Consider the simple 2 variable case, where we estimate the models:
! yi = β 0 + β1x i + vi (F)
when the true model is of the form:
! yi = β 0 + β1x i + β 2zi + vi (T)
If we try and estimate a coefficient for ! β1 using the false model, we will obtain an estimate of:
cov(x i ,zi )
! E(β̂1 ) = β1 + β 2
var(x i )
Therefore OLS will be bias unless:
I. ! β 2 = 0 , however this simply means that the false model is the same as the true model.
II. ! cov(x i ,zi ) = 0 . This states that the omitted variable ! zi is unrelated to ! x i .
Note: We don’t care about the ! var(x i ) because we know its always positive and so doesn't give an
indication as to whether its an positively or negatively .
Example: Education and Wages
A simple example is to look at how wages are determined. If we take a simple model then one
would suggests that education is a key factor in determining the wage you achieve. The higher your
education, the higher you earn. However one key omitted variable left out is ability.
! yi = β 0 + β1schooli + vi (F)
! yi = β 0 + β1schooli + β 2ability + vi (T)
The coefficient on ability, ! β 2 will be positive as more abled people earn more than less abled
people because they have a higher marginal productivity and people are paid their MPL. The
covariance between education and ability, ! cov(school,ability) > 0 will also be positive as the
marginal cost of education is less for more abled people as they understand more easily. Therefore
they are more likely to invest in schooling so the covariance is positive.
cov(schooli ,abilityi )
! E(β̂1 ) = β1 + > β1
var(school)