Jeffrey M. Wooldridge m m
Chapter 1 The Nature of Econometrics and Economic Data ............................................. 1
m m m m m m m
Part 1
m Regression Analysis with Cross-Sectional Data ................................................. 1
m m m m
Chapter 2 The Simple Regression Model ......................................................................... 1
m m m m
Chapter 3 Multiple Regression Analysis: Estimation ....................................................... 2
m m m m
Chapter 4 m Multiple Regression Analysis: Inference .................................................... 4
m m m
Chapter 5 Multiple Regression Analysis: OLS Asymptotics ............................................ 5
m m m m m
Chapter 6 Multiple Regression Analysis: Further Issues .................................................. 6
m m m m m
Chapter 7 Multiple Regression Analysis with Qualitative Information: Binary variables 8
m m m m m m m m m
Chapter 8 Heteroskedasticity ........................................................................................... 9
m
Chapter 9 More on Specification and Data problems ..................................................... 12
m m m m m m
Part 2
m Regression Analysis with Time Series Data ...................................................... 14
m m m m m
Chapter 10m Basic Regression analysis with Time Series Data ...................................... 14
m m m m m m
Chapter 11m Further Issues in Using OLS with Time Series Data ................................... 16
m m m m m m m m
Chapter 12m Serial Correlation and Heteroskedasticity in Time Series Regression ........ 19
m m m m m m m
Part 3
m Advanced Topics .............................................................................................. 23
m
Chapter 13m Pooling Cross Sections across Time. Simple Panel Data Methods ............. 23
m m m m m m m m
Chapter 14m Advanced Panel Data Methods .................................................................. 25
m m m
Chapter 15m Instrumental Variables Estimation and Two Stage Least Squares .............. 27
m m m m m m m
Chapter 16m Simultaneous Equations Models ................................................................ 30
m m
Chapter 17m Limited Dependent Variable Models and Sample Selection Corrections 31
m m m m m m m m
Chapter 18
m m Advanced Time Series Topics ................................................................... 35
m m m
Chapter 19m Carrying Out an Empirical Project ............................................................. 39
m m m m
Appendix: Some fundamentals of probability .................................................................... 42
m m m m
,Introductory Econometrics m Study Notes by Zhipeng Yan m m m m
Chapter 1 The Nature of Econometrics and Economic Data m m m m m m m
I. The goal of any econometric analysis is to estimate the parameters in the model
m m m m m m m m m m m m m
and to test hypotheses about these parameters; the values and signs of the
m m m m m m m m m m m m m
parameters determine the validity of an economic theory and the effects of
m m m m m m m m m m m m
certain policies.
m m
II. Panel data - advantages: m m m
1. Having multiple observations on the same units allows us to control certain
m m m m m m m m m m m
unobserved characteristics of individuals, firms, and so on. The use of more than
m m m m m m m m m m m m m
one observation can facilitate causal inference in situations where inferring
m m m m m m m m m m
causality would be very hard if only a single cross section were available.
m m m m m m m m m m m m m
2. They often allow us to study the importance of lags in behavior or the result of
m m m m m m m m m m m m m m m
decision making.
m m
Part 1 m Regression Analysis with Cross-Sectional Data m m m m
Chapter 2 The Simple Regression Model m m m m m m
I. Model: Y = b0 + b1x + u m m m m m m m
1. Population regression function (PRF): E(y|x) = b0 + b1x m m m m m m m m
2. systematic part of y: b0 + b1x m m m m m m
3. unsystematic part: u m m
II. Sample regression function (SRF): yhat = b0hat + b1hat*x m m m m m m m m
1. PRF is something fixed, but unknown, in the population. Since the SRF is
m m m m m m m m m m m m
obtained for a given sample of data, a new sample will generate a different slope
m m m m m m m m m m m m m m m
and intercept.
m m
III. Correlation: it is possible for u to be uncorrelated with x while being m m m m m m m m m m m m
correlated with functions of x, such as x2.
m m m m m m m m
E(u|x) = E(u) Cov(u, x) = 0. not vice versa.
m m m m m m m m m m
IV. Algebraic properties of OLS statistics m m m m
1. The sum of the OLS residuals is zero.
m m m m m m m
2. The sample covariance between the (each) regressors and the residuals is zero.
m m m m m m m m m m m
Consequently, the sample covariance between the fitted values and the residuals is
m m m m m m m m m m m m
zero.
m
3. The point ( x, y ) is on the OLS regression line.
m m m m m m m m m m m
4. the goodness-of-fit of the model is invariant to changes in the units of y or x.
m m m m m m m m m m m m m m m
5. The homoskedasticity assumption plays no role in showing OLS estimators are
m m m m m m m m m m
unbiased.
m
V. Variance
1. Var(b1) = var(u)/SSTx m m
a. more variation in the unobservables (u) affecting y makes it more difficult to
m m m m m m m m m m m m
precisely estimate b1.
m m m
1
,Introductory Econometrics m Study Notes by Zhipeng Yan m m m m
b. More variability in x is preferred, since the more spread out is the sample of
m m m m m m m m m m m m m m
independent variables, the easier it is to trace out the relationship between E(y|x)
m m m m m m m m m m m m m
and x. That is, the easier it is to estimate b1.
m m m m m m m m m m m
2. standard error of the regression, standard error of the estimate and the root m m m m m m m m m m m m
1
2 m
mean squared error = u
(n 2)
m m m m m
m m
Chapter 3 Multiple Regression Analysis: Estimation m m m m
I. The power of multiple regression analysis is that is allows us to do in
m m m m m m m m m m m m m
nonexperimental environments what natural scientists are able to do in a
m m m m m m m m m m m
controlled laboratory setting: keep other factors fixed.
m m m m m m m
II. Model: Y = b0 + b1x1 + b2x2 + u m m m m m m m m m
b (v y ) /(v2 ) , where v is the OLS residuals from a simple regression of x1
n n
m m m m m m m m m m m m m m m m m m m
m m
1 i1 m i i1
m i1 i1
on x2. m
1. v is the part of x1 that is uncorrelated with x2, or v is x1 after the effects of x2 have
m m m m m m m m m m m m m m m m m m m m
been partialled out, or netted out. Thus, b1 measures the sample relationship
m m m m m m m m m m m m
between y and x1 after x2 has been partialled out.
m m m m m m m m m m
III. Goodness-of-fit
1. R2 = the squared correlation coefficient between the actual y and the fitted
m m m m m m m m m m m m
values yhat. m m
2. R2 never decreases because the sum of squared residuals never increases when
m m m m m m m m m m m
additional regressors are added to the model.
m m m m m m m
IV. Regression through the origin: m m m
1. OLS residuals no longer have a zero sample average.
m m m m m m m m
2. R2 can be negative. This means that the sample average “explains” more of the
m m m m m m m m m m m m m
variation in the y than the explanatory variables.
m m m m m m m m
V. MLR Assumptions: m
A1: linear in parameters. A2:
m m m m
random sampling.
m m
A3: Zero conditional mean: E(u|x1, x2, …, xk) = 0
m m m m m m m m m
When A3 holds, we say that we have Exogenous explanatory variables. If xj is
m m m m m m m m m m m m m
correlated with u for any reason, then xj is said to be an endogenous explanatory
m m m m m m m m m m m m m m m
variables.
m
A4: No perfect collinearity.
m m m
A1 – A4 unbiasedness of OLS
m m m m m m
VI. Overspecifying the model: m m
1. Including one or more irrelevant variables, does not affect the unbiasedness of the m m m m m m m m m m m m
OLS estimators. m m
2
, Introductory Econometrics m Study Notes by Zhipeng Yan m m m m
VII. Variance of OLS estimators: m m m
A5: homoskedasticity
m m
1. Gauss – Markov assumptions: A1 – A5 m m m m m m
2 2
2. Var(bj )
m
, where R is from regressing xj on all other independent
SST j (1 R j)
m m m m m m m m m m m m
2 m m
m m m m m
variables (and including an intercept). m m m m
a. The error variance, σ2, is a feature of the population, it has nothing to do with the
m m m m m m m m m m m m m m m m
sample size.
m m
b. SSTj: the total sample variation in xj: a small sample size small value of SSTj
m m m m m m m m m m m m m m m
large var(bj) m m
c. R 2j : high correlation between two or more independent variables is called
m m m m m m m m m m m
multicollinearity.
3. A high degree of correlation between certain independent variables can be
m m m m m m m m m m
irrelevant as to how well we can estimate other parameters in the model: Y
m m m m m m m m m m m m m m
= b0 + b1x1 + b2x2 + b3x3 + u, where x2 and x3 are highly correlated.
m m m m m m m m m m m m m m m m m
The var(b2) and var(b3) may be large. But the amount of correlation between x2 and x3
m m m m m m m m m m m m m m m
has no direct effect on var(b1). In fact, if x1 is uncorrelated with x2 and x3, then
m m m m m m m m m m m m m m m m m
2
R12=0 and var(b1) =
m
m m m , regardless of how much correlation there is between x2
m m m m m m m m m m m
SST1
and x3.
m m
If b1 is the parameter of interest, we do not really care about the amount of
m m m m m m m m m m m m m m m
correlation between x2 and x3. m m m m
4. The tradeoff between bias and variance. m m m m m
If the true model is Y = b0 + b1x1 + b2x2 + u, instead, we estimate Y = b0 + b’1x1 + u
m m m m m m m m m m m m m m m m m m m m m m m
a. when b2 is nonzero, b’1 is biased, b1 is unbiased, var(b’1)<var(b1); m m m m m m m m m m
b. when b2 is zero, b’1 is unbiased, b1 is unbiased, var(b’1)<var(b1) a m m m m m m m m m m m m
higher variance for the estimator of b1 is the cost of including an
m m m m m m m m m m m m m
irrelevant variable in a model; m m m m m
VIII. Estimating: standard errors of estimators. m m m m
1 2
1. Under A1-A5: E(σ’ ) = σ , where σ’2 =
2 2
u (σ’ is σhat)
m
m m m m m m m m m m m
(n k 1) m m m m
2. Standard deviation of bj’, sd(bj’) = m m m m m m
SST j (1 R2)j m m m m m
hat
3. Standard error of bj’: se(bj’) = m m m m m m
SST j (1 R2)j m m m m m
Standard error of bj’ is not a valid estimator of sd(bj’) if the errors exhibit
m m m m m m m m m m m m m m
heteroskedasticity. Thus, while the presence of heteroskedasticity does not cause
m m m m m m m m m m
bias in the bj’, it does lead to bias in the usual formula for Var(bj’), which then
m m m m m m m m m m m m m m m m m
invalidates the standard errors.
m m m m
3