Summary

Summary MEA Module 1, 2, 5X and 7

Name: Summary MEA Module 1, 2, 5X and 7
SKU: doc_632919
Rating: 5.00 (1 reviews)
Author: SabrinaKok

Rating

5.0

(1)

Sold

Pages

Uploaded on

06-01-2020

Written in

2019/2020

This is a short, complete overview of the following modules of Methods of Empirical Analysis: module 1 (introduction), module 2 (time-series), module 5X (qualitative research political science) and module 7 (multilevel panel data).

Show more Read less

Institution

Course

Content preview

Summary Methods of Empirical Analysis

Module 1 – Introduction:
Empirical analysis = find useful patterns in data.
The four V’s of Big Data: volume (scale), variety (different forms), velocity (analysis of streaming
data), veracity (uncertainty of data).
Data science = hacking skills + math & statistical knowledge + substantive expertise

To see the effect of an independent variable on a dependent variable we use ordinary linear
regression (OLS). It tells us how independent variables are related to some dependent variable.
It is a description of the linear relationship between variables.
We cannot know the theoretical relationship, we can estimate the
empirical relationship, therefore we include the error term:
variation comes naturally.
ŷ = 𝑏1 + 𝑏2𝑥
𝑦𝑖 = 𝑏1 + 𝑏2𝑥𝑖 + 𝑒Ƹ𝑖
There is a theoretical model, predicting the Q’s, and we have
actual observations (the P’s). We extend the model to 𝑦 = 𝛽 + 𝛽 𝑥
+ 𝑒 to account for such deviations, with 𝑒 being the error term.

In reality, we don’t know the theoretical relationship (the Q’s), we use our observations (P’s) to
approximate the theoretical relationship. This is called the estimated model. Differences
between observed values and estimated values are called residuals. Thus: the error term is
defined as the difference between the actual observation and the non-random component (y =
b0 + b1x1) of the theoretical relationship. The residuals are defined as the differences between
the actual observation and the estimated values (ŷ = b0 + b1x1). We use these residuals to test
whether assumptions are met, to determine goodness-of-fit of the model and to calculate the
likelihood that model coefficients are different from zero.

The assumptions of OLS:
1. All variables must be measured at interval level and without error;
2. For each value of the independent variables, the expected error term should be 0;
3. Homoscedasticity: the variance of the data points is independent of x;
4. There is no autocorrelation (the error terms are not correlated);
5. Each independent variable is uncorrelated with the error term. If violated, we have
omitted variable bias;
6. There is no multicollinearity (you cannot explain one IV with another IV);
7. The conditional errors are normally distributed: ei | Xi ~ N(0, σ2).
Two additional assumptions:
8. The values of Y are linearly dependent on the predictors (IV’s);
9. Parameters of the model have for each individual (observation) the same value.

The OLS-regression line is the line where the sum of the squared residuals is minimized. This is
the Least Squares Principle. LSP determines the model coefficients b such that the sum of
squared residuals is minimized.
In a linear regression model that satisfies the OLS assumptions, the least squares estimator is the
Best Linear Unbiased Estimator (BLUE) of each linear combination of the observations.
Best = smallest variance
Unbiased = without error: the expected value of the parameter estimated by the model is equal
to its population value.
This BLUE-ness was found out in the Gauss-Markov Theorem.

With residual analysis we check how our model looks like:
1. Global evaluation of the model;

1

, 2. Determine the role of individual cases;
3. Check trustworthiness of statistical test outcomes.
We can use graphical instruments and numerical instruments (statistics that indicate the
presence of outliers and influential cases; indicators of dependencies among independent
variables). The best is to combine those two.

Graphical instruments:
- Plots
o Scatterplot à displays association between two variables;
o Partial plot à displays association between two variables, with controlling for
other variables in your model.
- Histogram à shows the density functions. Tells if the data is normally distributed or not.
It is not a problem if your data is not normally distributed, as long as your error term is
normally distributed.

Numerical instruments:
- Lever à how far removed is one value of the independent variable from all the other
values of this variable? Thus: how far is an individual value removed from the mean;
- Mahal à does the same;
- Cook’s distance D or DfFit à estimate all the parameters with the value that is the
potential outlier, and without it. This is the most important measure to identify outliers.
These methods are to check the dispersion of the variables. There are also commands to look at
the residuals (like ZRESID, SDRESID etc).
Outliers are cases extremely far away from the mean, influential cases will change the outcome
of the model.

We need to test the assumptions described above:
1. Variables must be measured at interval level and without measurement error. The points
should be perfectly on the line. Error in X is difficult to correct, error on Y is not
problematic, because it’s captured in the error term.

2. The mean value of the error term is 0 for each X value. If
violated: the relationship is not linear, more generally
speaking: there is a predictor missing.
3. Residuals are homoscedastic. Heteroscedasticity: if we
increase in age (X), the residuals increase. Problem: we
are overestimating the effect, model not BLUE anymore,
but LUE. You can detect this with an inspection of the
plots and the Breusch-Pagan test (White-test). Solution:
provide a weight/generalized least square estimator (weighted least squares: the values
with smaller variance count heavier) or do the test without using the distorted standard
errors: robust standard errors.
4. The residuals are not correlated, no autocorrelation. If violated, the cause of the problem
is often that an important predictor is missing, or that there is a cluster sample. The
solution for this is multilevel modelling.

2

, 5. Each independent variable is uncorrelated with the error term. If not, there is
specification error, the model is not correctly specified. This is often violated without
knowing it: how do you know that a variable is missing?
6. No independent variable is perfectly (nor approximately) linearly related to one or more
of the other independent variables in the model. If this is violated and there is an almost
linear relation between explanatory variables, we call this multicollinearity. The
consequence is that the standard errors will be larger than they should be. You can
detect it by looking at correlations, the VIF or tolerance score (1/VIF). A VIF greater than
5-10 or a TOL smaller than 0.2-0.1 indicates multicollinearity. Solutions for
multicollinearity: add new information (increase sample size) or delete one of the
involved variables.
7. Residuals are normally distributed for each X value. However, the larger your N
becomes, the less likely it is that this problem occurs.

So, to summarize, there are a few possible solutions when you detect problems in your data:
- Remove cases
You remove cases from your dataset and treat them as if they were never there. This can be
necessary if individual cases have a disproportionally large influence on the outcome of the
analysis. However, it is not needed with large datasets (>500 cases), because the influence of an
individual case is then generally neglectable. Remember: only influential cases need to be
removed, not outliers. Also, don’t remove more than one influential case at the same time.
- Transform variables
Be very careful with changing the dependent variable, because this influences coefficients of all
x-variables. If the relationship is in reality not linear, add regressors as new variables to the
model to have a better description of the relationship. This is called polynomial regression.

- Add new explanatory variables to the model
- Use other estimation techniques (robust)
- Remove variables or increase sample size (to overcome multicollinearity)

Dummy variables:
Use dummies if your data is
not interval or ratio level.
Create a dummy for every
category as 0 = not present, 1
= present. One dummy must
be left out of the model, this
is the reference category. See
example for interpretation à

Instead of defining dummies
with binary/dummy coding,
one can also use effect coding
(1, 0, -1) or contrast coding.

3

Report Copyright Violation

Written for

Institution: Radboud Universiteit Nijmegen (RU)
Study: Master Economics
Course: Methods of Empirical Analysis

All documents for this subject (1)

Document information

Uploaded on: January 6, 2020
Number of pages: 16
Written in: 2019/2020
Type: SUMMARY

Subjects

economics
master
methods

$4.70

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

SabrinaKok

4.1

(25)

Reviews from verified buyers

Showing all reviews

claurunner financial economics · 28 reviews

5 year ago

5.0

1 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

SabrinaKok Radboud Universiteit Nijmegen

View profile

Sold

161

Member since

10 year

Number of followers

149

Documents

Last sold

2 year ago

4.1

25 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller SabrinaKok. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.70. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 50176 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary MEA Module 1, 2, 5X and 7

Content preview

Written for

Document information

Subjects

Reviews from verified buyers

Get to know the seller

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?