100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Everything you need to know about modules 1, 2, 4 and 5(x) Methods of Empirical Analysis

Rating
-
Sold
3
Pages
27
Uploaded on
19-03-2023
Written in
2022/2023

Within this document you will find everthing you need to know to be prepared for the exam of Methods of Empirical Analysis. It includes handy lists, short summaries of important literature and lectures, warnings, output in R and much more.

Show more Read less
Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
March 19, 2023
Number of pages
27
Written in
2022/2023
Type
Summary

Subjects

Content preview

Module 1

OLS assumptions:

1. All variables must be measured at interval level and without measurement error.
 Error in Y not problematic as it is addressed by adding the error term, error
in X is. Leads to underestimating the coefficient of the X variable. Data
collection should have been done better.
 Violated with nominal or ordinal data, one should address this by adding
dummies.
2. The mean value of the error term is 0 for each value of X.
 Not really a problem as the program will draw the line where the mean
value of the error term is 0. So that there is no constant over- or
underestimation, no general average positive or negative residual.
3. Error terms are homoscedastic
 The variance should be the same for each value X
 Consequence if violated: LUE instead of BLUE, standard errors of
parameters are biased and statistical tests thus not reliable.
 You can detect heteroscedasticity by inspecting a plot or using a Breusch-
Pagan test.




The null hypothesis (H0) = there is no heteroscedasticity

The alternative hypothesis (H1) = there is heteroscedasticity

Since, the value is lower than alpha of 0.05. There is no heteroscedasticity.

 Solutions if heteroscedasticity is detected:
o Robust standard errors (White’s heteroscedasticity-consistent).
Adding an extra margin of error. So significance is less easily being
detected.
o Generalized least square estimator. We tell the model that variance
changes by values of X (for example the larger X is, the larger the
variance becomes). Then again we have a BLUE model.
4. Error terms are not correlated (no autocorrelation)
 You should not be able to predict the next error term.

,  Likely causes: predictor missing, cluster sampling (having values from one
class, and from another but not taking that into account. The children
could have had different teachers, better education).
 Solution for cluster sampling: multilevel data.
 Note, often with time series data there is autocorrelation. That needs to be
addressed, see module 2.
5. Each independent variable is uncorrelated with the error term (omitted variable
bias)
 Causes: functional form is wrong (so assuming linearity, where there is
none, assuming direct effect where there is an interaction effect, omitted
variable bias). The first two are theory-based, the third one can be because
you do not have data about that variable. Additionally, if the omitted
variable is correlated with both the dependent and independent variable,
the coefficient of that independent variable will be stronger than its real
effect.
6. No independent variable is perfectly linearly related to one or more of the other
independent variables in the model (multicollinearity).
 Such a relation between independent variables would lead to an increase in
standard errors of the coefficients. Thus the estimate becomes less precise
and the estimate might be sensitive for adding a few new observations.
(note that R2 remains the same).
 You can detect multicollinearity by looking at the correlation between two
independent variables, but there could be correlation between multiple
independent variables. Thus better is applying VIF/TOL.
o This would mean running a regression between an independent
variable and all other independent variables. A high R2 then
indicates multicollinearity. The VIF is calculated by 1/(1-R 2), so the
higher the R2 the higher the VIF. A VIF greater than 5-10 indicates
multicollinearity. Related is TOL, 1/VIF. So 0.2-0.1 TOL would
indicate multicollinearity.
 Solutions: increase sample size, delete one of the involved variables (with
dummies, always remove the dummy).
7. Error terms are normally distributed for each X value.
 Not very important as estimates remain rather robust.

Additional assumptions:

1. Values of Y are linearly dependent on the independent variables.
 Not really a problem, only interpretation.
 Be aware, you should not add polynomials for the sake of better fitting the
model. As with twenty polynomials the line is all over the place, thus
losing its relevance.
2. Parameters of the model should have the same value for all individuals.

,  Does not hold where there is an interaction effect. To create such an effect
multiply two variables with each other.
o Note that creating an interaction variable often means that you
would have to center the interacting variables first. The reason is
that if you would like to look at the effect of educ only, then year
should be 0, which would not make sense always. You might want
to look at the effect of educ when the other variable has a mean
value. Income = β0 + β1*educ + β2*year + β3*EducYear
- Note furtherly that the mean with categorical data does not
make sense, so we only center the mean of variables on
interval or ratio level

If met, then BLUE, meaning the best linear unbiased estimates. So best, indicating
smallest variance of the parameters and correctly calculated standard error terms. Linear
means the independent variables influence the dependent variable linearly and unbiased
estimates means that the coefficients in the model represent those of the population.

The error term indicates the difference between the actual observed values and the
theoretical values gained from the theoretical relation. The model has a non-random
component, how independent variables influence the dependent variable, based on
theory. And a random component, which is the error term. Thus the error term
indicates:

- Omitted variables
- Random human behaviour
- Approximation errors

The residuals indicate the difference between the observed values and the estimates
values.

Note: We can only observe the residuals, thus we can use them for testing the
assumptions or the goodness-of-fit of the model. Etc.

Least squared principle. The sum of the squared residuals is minimized.

Influential case: an observation that has a strong influence on the regression coefficients
(so with large datasets this would be less the case). It can be measured by DFFIT,
difference between prediction of Y with and without an observation.

- Only remove them if the influence is disproportionally large and make a strong
case why you should remove it. Only remove one influential case at the time, as it
could already have solved the disproportionate influence on the coefficients.

Outliers: individual observations for which the model fits badly (large residual).

Dummy variables are variables resembling data measured on nominal level (country,
city, religion, no order between them) or ordinal level (education, social class,
$10.19
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
rjhvrinzen
5.0
(1)

Get to know the seller

Seller avatar
rjhvrinzen Radboud Universiteit Nijmegen
Follow You need to be logged in order to follow users or courses
Sold
7
Member since
6 year
Number of followers
3
Documents
4
Last sold
5 months ago

5.0

1 reviews

5
1
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions