Endogeneity, IV, GMM, and MLE
Fei Deng
May 21, 2025
Abstract
This document provides a comprehensive review of Weeks 1, 2, and 3 from
my Econometrics 2 course, focusing on endogeneity, instrumental variables (IV),
Generalized Method of Moments (GMM), and Maximum Likelihood Estimation
(MLE). It is designed for my preparation for an exam on May 26, 2025, assuming
limited prior econometrics knowledge. The review covers key concepts, examples,
mathematical derivations, practice questions, and exam preparation tips, presented
accessibly.
1 Introduction
This review summarizes Weeks 1, 2, and 3 of my Econometrics 2 course. Week 1 ad-
dresses endogeneity in the Classical Linear Regression Model (CLRM) and Instru-
mental Variables (IV) as a solution. Week 2 introduces the Generalized Method of
Moments (GMM), extending IV to handle complex models. Week 3 covers Maximum
Likelihood Estimation (MLE) and likelihood-based testing, focusing on binary choice
models. The material builds on Econometrics 1, and this summary is crafted to aid my
exam preparation with intuitive explanations.
2 Week 1: Endogeneity and Instrumental Variables
2.1 Classical Linear Regression Model (CLRM)
The CLRM assumes a linear relationship:
yi = x′i β + εi , (1)
where:
• yi : Dependent (endogenous) variable (e.g., earnings).
• xi : Explanatory (exogenous) variables (e.g., years of schooling).
• β: Coefficients to estimate.
• εi : Error term (unobserved factors).
1
, Exogeneity assumption: The error term is uncorrelated with explanatory variables:
E[εi |xi ] = 0 =⇒ Cov(xi , εi ) = 0. (2)
If exogeneity holds, Ordinary Least Squares (OLS) provides unbiased and consistent
estimates of β. If not, endogeneity causes problems.
2.2 Endogeneity
Endogeneity occurs when an explanatory variable xi is correlated with the error term:
Cov(xi , εi ) ̸= 0. This violates the exogeneity assumption, making OLS biased and incon-
sistent.
Example 1 (Earnings and Schooling). Consider the model:
log(earnings) = β1 + β2 s + ε, (3)
where s is years of schooling. If individuals choose schooling based on unmeasured ability,
which also affects earnings, ability is in ε, causing Cov(s, ε) ̸= 0.
2.3 Causes of Endogeneity
There are five main causes of endogeneity:
1. Omitted Variables: Missing variables correlated with both xi and yi .
• Example: Ability affects schooling and earnings but is omitted.
• Math: True model is yi = x′i β + wi′ γ + εi , but estimated as yi = x′i β + ui , where
ui = wi′ γ + εi . If Cov(xi , wi ) ̸= 0, then Cov(xi , ui ) ̸= 0.
2. Measurement Error: Errors in measuring xi or yi .
3. Example: Misreported schooling introduces error, correlating with ε.
4. Simultaneity: yi and xi affect each other.
5. Example: Schooling affects earnings, but expected earnings influence schooling.
6. Math: yi = βxi + εi , xi = αyi + zi .
7. Sample Selection: Non-random sample biases results.
8. Example: Studying income only for employed individuals.
9. Misspecified Dynamics: Ignoring lagged effects in time-series data.
10. Example: Autoregressive model yt = γyt−1 + εt with serial correlation.
2.4 Consequences of Endogeneity
Endogeneity leads to:
• Bias: OLS estimates are biased:
E[β̂OLS |X] = β + (X ′ X)−1 X ′ E[ε|X] ̸= β. (4)
2