Jeffrey M. Wooldridge
Chapter 1 The Nature of Econometrics and Economic Data........................................... 1
Part 1 Regression Analysis with Cross-Sectional Data................................................ 1
Chapter 2 The Simple Regression Model........................................................................ 1
Chapter 3 Multiple Regression Analysis: Estimation...................................................... 2
Chapter 4 Multiple Regression Analysis: Inference................................................... 4
Chapter 5 Multiple Regression Analysis: OLS Asymptotics .......................................... 5
Chapter 6 Multiple Regression Analysis: Further Issues ................................................ 6
Chapter 7 Multiple Regression Analysis with Qualitative Information: Binary variables
8
Chapter 8 Heteroskedasticity........................................................................................... 9
Chapter 9 More on Specification and Data problems.................................................... 12
Part 2 Regression Analysis with Time Series Data.................................................... 14
Chapter 10 Basic Regression analysis with Time Series Data .................................... 14
Chapter 11 Further Issues in Using OLS with Time Series Data................................ 16
Chapter 12 Serial Correlation and Heteroskedasticity in Time Series Regression ..... 19
Part 3 Advanced Topics ............................................................................................. 23
Chapter 13 Pooling Cross Sections across Time. Simple Panel Data Methods .......... 23
Chapter 14 Advanced Panel Data Methods................................................................. 25
Chapter 15 Instrumental Variables Estimation and Two Stage Least Squares ........... 27
Chapter 16 Simultaneous Equations Models............................................................... 30
Chapter 17 Limited Dependent Variable Models and Sample Selection Corrections 31
Chapter 18 Advanced Time Series Topics .................................................................. 35
Chapter 19 Carrying Out an Empirical Project ........................................................... 39
Appendix: Some fundamentals of probability .................................................................. 42
,Introductory Econometrics Study Notes by Zhipeng Yan
Chapter 1 The Nature of Econometrics and Economic Data
I. The goal of any econometric analysis is to estimate the parameters in the
model and to test hypotheses about these parameters; the values and signs of
the parameters determine the validity of an economic theory and the effects of
certain policies.
II. Panel data - advantages:
1. Having multiple observations on the same units allows us to control certain
unobserved characteristics of individuals, firms, and so on. The use of more than
one observation can facilitate causal inference in situations where inferring
causality would be very hard if only a single cross section were available.
2. They often allow us to study the importance of lags in behavior or the result of
decision making.
Part 1 Regression Analysis with Cross-Sectional Data
Chapter 2 The Simple Regression Model
I. Model: Y = b0 + b1x + u
1. Population regression function (PRF): E(y|x) = b0 + b1x
2. systematic part of y: b0 + b1x
3. unsystematic part: u
II. Sample regression function (SRF): yhat = b0hat + b1hat*x
1. PRF is something fixed, but unknown, in the population. Since the SRF is
obtained for a given sample of data, a new sample will generate a different slope
and intercept.
III. Correlation: it is possible for u to be uncorrelated with x while being
correlated with functions of x, such as x2.
E(u|x) = E(u) Æ Cov(u, x) = 0. not vice versa.
IV. Algebraic properties of OLS statistics
1. The sum of the OLS residuals is zero.
2. The sample covariance between the (each) regressors and the residuals is zero.
Consequently, the sample covariance between the fitted values and the residuals is
zero.
3. The point ( x, y ) is on the OLS regression line.
4. the goodness-of-fit of the model is invariant to changes in the units of y or x.
5. The homoskedasticity assumption plays no role in showing OLS estimators are
unbiased.
V. Variance
1. Var(b1) = var(u)/SSTx
a. more variation in the unobservables (u) affecting y makes it more difficult to
precisely estimate b1.
1
,Introductory Econometrics Study Notes by Zhipeng Yan
b. More variability in x is preferred, since the more spread out is the sample of
independent variables, the easier it is to trace out the relationship between E(y|x)
and x. That is, the easier it is to estimate b1.
2. standard error of the regression, standard error of the estimate and the root
1
∑
2
mean squared error = u
( n − 2)
Chapter 3 Multiple Regression Analysis: Estimation
I. The power of multiple regression analysis is that is allows us to do in
nonexperimental environments what natural scientists are able to do in a
controlled laboratory setting: keep other factors fixed.
II. Model: Y = b0 + b1x1 + b2x2 + u
n n
b1 = (∑ vi1 yi ) /(∑ vi21 ) , where v is the OLS residuals from a simple regression of x1
i =1 i =1
on x2.
1. v is the part of x1 that is uncorrelated with x2, or v is x1 after the effects of x2
have been partialled out, or netted out. Thus, b1 measures the sample relationship
between y and x1 after x2 has been partialled out.
III. Goodness-of-fit
1. R2 = the squared correlation coefficient between the actual y and the fitted
values yhat.
2. R2 never decreases because the sum of squared residuals never increases when
additional regressors are added to the model.
IV. Regression through the origin:
1. OLS residuals no longer have a zero sample average.
2. R2 can be negative. This means that the sample average “explains” more of the
variation in the y than the explanatory variables.
V. MLR Assumptions:
A1: linear in parameters.
A2: random sampling.
A3: Zero conditional mean: E(u|x1, x2, …, xk) = 0
When A3 holds, we say that we have Exogenous explanatory variables. If xj is
correlated with u for any reason, then xj is said to be an endogenous explanatory
variables.
A4: No perfect collinearity.
A1 – A4 Æ unbiasedness of OLS
VI. Overspecifying the model:
1. Including one or more irrelevant variables, does not affect the unbiasedness of the
OLS estimators.
2
, Introductory Econometrics Study Notes by Zhipeng Yan
VII. Variance of OLS estimators:
A5: homoskedasticity
1. Gauss – Markov assumptions: A1 – A5
σ2
2. Var (b j ) = , where R2 is from regressing xj on all other independent
SST j (1 − R 2j )
variables (and including an intercept).
a. The error variance, σ2, is a feature of the population, it has nothing to do with the
sample size.
b. SSTj: the total sample variation in xj: a small sample size Æ small value of SSTj
Æ large var(bj)
c. R 2j : high correlation between two or more independent variables is called
multicollinearity.
3. A high degree of correlation between certain independent variables can be
irrelevant as to how well we can estimate other parameters in the model:
Y = b0 + b1x1 + b2x2 + b3x3 + u, where x2 and x3 are highly correlated.
The var(b2) and var(b3) may be large. But the amount of correlation between x2 and
x3 has no direct effect on var(b1). In fact, if x1 is uncorrelated with x2 and x3, then
2 σ2
R =0 and var(b1) =
1 , regardless of how much correlation there is between x2
SST1
and x3.
If b1 is the parameter of interest, we do not really care about the amount of
correlation between x2 and x3.
4. The tradeoff between bias and variance.
If the true model is Y = b0 + b1x1 + b2x2 + u, instead, we estimate Y = b0 + b’1x1 + u
a. when b2 is nonzero, b’1 is biased, b1 is unbiased, var(b’1)<var(b1);
b. when b2 is zero, b’1 is unbiased, b1 is unbiased, var(b’1)<var(b1) Æ a
higher variance for the estimator of b1 is the cost of including an
irrelevant variable in a model;
VIII. Estimating: standard errors of estimators.
1
∑
2
1. Under A1-A5: E(σ’2) = σ2, where σ’2 = u (σ’ is σhat)
(n − k − 1)
σ
2. Standard deviation of bj’, sd(bj’) =
SST j (1 − R 2j )
σhat
3. Standard error of bj’: se(bj’) =
SST j (1 − R 2j )
Standard error of bj’ is not a valid estimator of sd(bj’) if the errors exhibit
heteroskedasticity. Thus, while the presence of heteroskedasticity does not cause
bias in the bj’, it does lead to bias in the usual formula for Var(bj’), which then
invalidates the standard errors.
3