100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary MEA Module 1, 2, 5X and 7

Rating
5.0
(1)
Sold
10
Pages
16
Uploaded on
06-01-2020
Written in
2019/2020

This is a short, complete overview of the following modules of Methods of Empirical Analysis: module 1 (introduction), module 2 (time-series), module 5X (qualitative research political science) and module 7 (multilevel panel data).

Show more Read less
Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
January 6, 2020
Number of pages
16
Written in
2019/2020
Type
Summary

Subjects

Content preview

Summary Methods of Empirical Analysis

Module 1 – Introduction:
Empirical analysis = find useful patterns in data.
The four V’s of Big Data: volume (scale), variety (different forms), velocity (analysis of streaming
data), veracity (uncertainty of data).
Data science = hacking skills + math & statistical knowledge + substantive expertise

To see the effect of an independent variable on a dependent variable we use ordinary linear
regression (OLS). It tells us how independent variables are related to some dependent variable.
It is a description of the linear relationship between variables.
We cannot know the theoretical relationship, we can estimate the
empirical relationship, therefore we include the error term:
variation comes naturally.
ŷ = 𝑏1 + 𝑏2𝑥
𝑦𝑖 = 𝑏1 + 𝑏2𝑥𝑖 + 𝑒Ƹ𝑖
There is a theoretical model, predicting the Q’s, and we have
actual observations (the P’s). We extend the model to 𝑦 = 𝛽 + 𝛽 𝑥
+ 𝑒 to account for such deviations, with 𝑒 being the error term.

In reality, we don’t know the theoretical relationship (the Q’s), we use our observations (P’s) to
approximate the theoretical relationship. This is called the estimated model. Differences
between observed values and estimated values are called residuals. Thus: the error term is
defined as the difference between the actual observation and the non-random component (y =
b0 + b1x1) of the theoretical relationship. The residuals are defined as the differences between
the actual observation and the estimated values (ŷ = b0 + b1x1). We use these residuals to test
whether assumptions are met, to determine goodness-of-fit of the model and to calculate the
likelihood that model coefficients are different from zero.

The assumptions of OLS:
1. All variables must be measured at interval level and without error;
2. For each value of the independent variables, the expected error term should be 0;
3. Homoscedasticity: the variance of the data points is independent of x;
4. There is no autocorrelation (the error terms are not correlated);
5. Each independent variable is uncorrelated with the error term. If violated, we have
omitted variable bias;
6. There is no multicollinearity (you cannot explain one IV with another IV);
7. The conditional errors are normally distributed: ei | Xi ~ N(0, σ2).
Two additional assumptions:
8. The values of Y are linearly dependent on the predictors (IV’s);
9. Parameters of the model have for each individual (observation) the same value.

The OLS-regression line is the line where the sum of the squared residuals is minimized. This is
the Least Squares Principle. LSP determines the model coefficients b such that the sum of
squared residuals is minimized.
In a linear regression model that satisfies the OLS assumptions, the least squares estimator is the
Best Linear Unbiased Estimator (BLUE) of each linear combination of the observations.
Best = smallest variance
Unbiased = without error: the expected value of the parameter estimated by the model is equal
to its population value.
This BLUE-ness was found out in the Gauss-Markov Theorem.

With residual analysis we check how our model looks like:
1. Global evaluation of the model;


1

, 2. Determine the role of individual cases;
3. Check trustworthiness of statistical test outcomes.
We can use graphical instruments and numerical instruments (statistics that indicate the
presence of outliers and influential cases; indicators of dependencies among independent
variables). The best is to combine those two.

Graphical instruments:
- Plots
o Scatterplot à displays association between two variables;
o Partial plot à displays association between two variables, with controlling for
other variables in your model.
- Histogram à shows the density functions. Tells if the data is normally distributed or not.
It is not a problem if your data is not normally distributed, as long as your error term is
normally distributed.

Numerical instruments:
- Lever à how far removed is one value of the independent variable from all the other
values of this variable? Thus: how far is an individual value removed from the mean;
- Mahal à does the same;
- Cook’s distance D or DfFit à estimate all the parameters with the value that is the
potential outlier, and without it. This is the most important measure to identify outliers.
These methods are to check the dispersion of the variables. There are also commands to look at
the residuals (like ZRESID, SDRESID etc).
Outliers are cases extremely far away from the mean, influential cases will change the outcome
of the model.

We need to test the assumptions described above:
1. Variables must be measured at interval level and without measurement error. The points
should be perfectly on the line. Error in X is difficult to correct, error on Y is not
problematic, because it’s captured in the error term.




2. The mean value of the error term is 0 for each X value. If
violated: the relationship is not linear, more generally
speaking: there is a predictor missing.
3. Residuals are homoscedastic. Heteroscedasticity: if we
increase in age (X), the residuals increase. Problem: we
are overestimating the effect, model not BLUE anymore,
but LUE. You can detect this with an inspection of the
plots and the Breusch-Pagan test (White-test). Solution:
provide a weight/generalized least square estimator (weighted least squares: the values
with smaller variance count heavier) or do the test without using the distorted standard
errors: robust standard errors.
4. The residuals are not correlated, no autocorrelation. If violated, the cause of the problem
is often that an important predictor is missing, or that there is a cluster sample. The
solution for this is multilevel modelling.




2

, 5. Each independent variable is uncorrelated with the error term. If not, there is
specification error, the model is not correctly specified. This is often violated without
knowing it: how do you know that a variable is missing?
6. No independent variable is perfectly (nor approximately) linearly related to one or more
of the other independent variables in the model. If this is violated and there is an almost
linear relation between explanatory variables, we call this multicollinearity. The
consequence is that the standard errors will be larger than they should be. You can
detect it by looking at correlations, the VIF or tolerance score (1/VIF). A VIF greater than
5-10 or a TOL smaller than 0.2-0.1 indicates multicollinearity. Solutions for
multicollinearity: add new information (increase sample size) or delete one of the
involved variables.
7. Residuals are normally distributed for each X value. However, the larger your N
becomes, the less likely it is that this problem occurs.

So, to summarize, there are a few possible solutions when you detect problems in your data:
- Remove cases
You remove cases from your dataset and treat them as if they were never there. This can be
necessary if individual cases have a disproportionally large influence on the outcome of the
analysis. However, it is not needed with large datasets (>500 cases), because the influence of an
individual case is then generally neglectable. Remember: only influential cases need to be
removed, not outliers. Also, don’t remove more than one influential case at the same time.
- Transform variables
Be very careful with changing the dependent variable, because this influences coefficients of all
x-variables. If the relationship is in reality not linear, add regressors as new variables to the
model to have a better description of the relationship. This is called polynomial regression.




- Add new explanatory variables to the model
- Use other estimation techniques (robust)
- Remove variables or increase sample size (to overcome multicollinearity)

Dummy variables:
Use dummies if your data is
not interval or ratio level.
Create a dummy for every
category as 0 = not present, 1
= present. One dummy must
be left out of the model, this
is the reference category. See
example for interpretation à

Instead of defining dummies
with binary/dummy coding,
one can also use effect coding
(1, 0, -1) or contrast coding.




3

Reviews from verified buyers

Showing all reviews
5 year ago

5.0

1 reviews

5
1
4
0
3
0
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
SabrinaKok Radboud Universiteit Nijmegen
Follow You need to be logged in order to follow users or courses
Sold
161
Member since
10 year
Number of followers
149
Documents
10
Last sold
2 year ago

4.1

25 reviews

5
9
4
11
3
4
2
0
1
1

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions