Key takeaways for each week
WEEK 1
● Difference between correlation and causality
● Introduced the potential outcomes model
● Discussed what are the parameters of interest: ATE & ATET
● Policy makers and economists are often interested in causal effects.
● Potential outcomes model provides a statistical framework to analyze causal effect.
● If (in a nonexperimental setting) some outcome is correlated to treatment, this does
not necessary imply causality.
● Social experiments provide a straightforward approach to estimate causal effects.
● Why are there not more social experiments:
○ 1 Social experiments may be costly.
○ 2 Ethical considerations.
● Selection issues are often present and complicate analysis.
● Selection can (often) be considered as omitted variables problem.
● Omitted variables may cause biased and inconsistent estimates.
WEEK 2
● There are many reasons why regressors may be endogenous (omitted variables,
reversed causality, measurement error, sample selection).
● Instrumental variables can deal with endogenous regressors.
● What a good instrument is depends on the application.
● Showed that we can estimate an IV using the 2SLS estimator
● Important that the instrumental variable is exogenous.
● IV estimators are not unbiased, but are consistent.
● Bias can be severe if instrument is weak (low predictive power).
● IV estimates the average treatment effect for the compliers (LATE).
● Compliers are individuals who change treatment status when the instrument changes
value.
● Compliers cannot be identified directly.
● Biggest challenge is finding good instruments (exogenous and relevant).
● What a good instrument is depends on the application.
WEEK 3
● Randomized experiments ensure that causal effects are estimated.
● Variation in scale, and field or laboratory.
● However, design crucial, cannot correct mistakes in design ex-post.
● Balancing table way to check ex-post whether randomization was done correctly.
● Different alternatives for full randomization if that is complicated/unfeasible
● Power analysis computes required size of experiment.
● Possible complications for simple analysis: nonrandom selection, attrition,
noncompliance, externalities.
1
, ● Attrition is only problematic if related to potential outcomes (otherwise only
reduction of power).
● External validity (population, context, administration, equilibrium effects, Hawthorne
effect).
WEEK 4
● Regression discontinuity allows estimation of causal effects in cases where treatment
is endogenous
● Requires a discontinuous jump of treatment probability in the running variable
● If the probability of treatment jumps from 0 to 1, discontinuity is sharp, otherwise it is
fuzzy
● Important to check:
○ Specification of relationship between outcome and running variable
○ (may be non-linear)
○ Bunching (manipulating running variable)
○ Continuity of other covariates around the threshold
● Various models for considering dummy endogenous variables.
● Linear probability models are easy to analyze and easy to interpret.
● However, functional form may be inconvenient.
● Logit and Probit models guarantee that probabilities are bounded between 0 and 1.
● Interpretation of coefficients is not straightforward.
● Link to consumer choice: model choice based on latent utility (unobserved, only
observe chosen outcome)
WEEK 5
● Panel data describe observations of individuals/regions/firms/etc. over time.
● Panel data models can deal with unit specific effects and can solve a lot of omitted
variable bias problems.
● Fixed effects or random effects model: Random effects model more efficient but need
stronger assumptions.
● Usual panel data models assume strict exogeneity of regressors.
● Dealing with lagged-endogenous variables uses instrumental variable methods.
● Policy changes can be used for evaluation with observational data
● The before-after estimator compares outcomes before and after a policy is
implemented
● To correct for other things that change over time, subtract change in control group:
difference-in-differences estimator
● DD-estimator provides causal effect if common trend assumption holds
● Look at pre-trends and do placebo tests to investigate plausibility
● DD-estimate can be obtained by performing simple regression
2
,WEEK 6
● Heritabilities of social science outcomes is considerable
● Genetic markers influence social science outcomes through non-deterministic
pathways that are likely to difficult to disentangle
● Social science outcomes are likely influenced by a very large number of SNPs, each
with tiny effect sizes
● Large sample sizes are key to achieve well-powered analyses
● GWAS methodology in combination with increasingly large sample sizes have
resulted in the discovery of many genome-wide significant SNPs for social science
outcomes
● Causality of genetic effects found in current GWAS studies can be
● questioned
● Principal components control for population stratification, but imperfectly
● Better solutions are possible when genetic data of family members is included in the
analysis, but such data is still too scarce for use in GWAS
● Polygenic scores are a summary measure of genetic endowments at the individual
level, that are sufficiently predictive to be used in “regular” econometric analyses
3
, Week 1.1 - Instrumental variables
Difference between correlation and causality
Causality is about questions such as:
● What would have happened
● What would happen
→ This requires knowing about unobserved outcomes, because we only observe one potential
outcome.
E.g. ‘Do people earn more if they complete university education?’
Correlation is a measure for the association between two variables.
→ One approach would be to compare those with university education to those without
university education.
However, correlation between D and Y can be caused by
1. A causal effect of D on Y
2. A causal effect of Y on D
3. Omitted variables: Z affects both D and Y
Potential outcome model
= general model to think about causal effects.
● Assume a treatment or choice variable can take two values (0/1)
● Each individual i has two potential outcomes, Y1i* with treatment and Y0i* without
treatment.
● Only one potential outcome is observed (factual). The unobserved outcome is the
counterfactual outcome.
● For an individual the effect of participating in the treatment equals:
● ∆i is always an unobserved random variable, because only one of the random
variables Y1i* and Y0i* is observed. (= the fundamental problem of causal inference)
4
WEEK 1
● Difference between correlation and causality
● Introduced the potential outcomes model
● Discussed what are the parameters of interest: ATE & ATET
● Policy makers and economists are often interested in causal effects.
● Potential outcomes model provides a statistical framework to analyze causal effect.
● If (in a nonexperimental setting) some outcome is correlated to treatment, this does
not necessary imply causality.
● Social experiments provide a straightforward approach to estimate causal effects.
● Why are there not more social experiments:
○ 1 Social experiments may be costly.
○ 2 Ethical considerations.
● Selection issues are often present and complicate analysis.
● Selection can (often) be considered as omitted variables problem.
● Omitted variables may cause biased and inconsistent estimates.
WEEK 2
● There are many reasons why regressors may be endogenous (omitted variables,
reversed causality, measurement error, sample selection).
● Instrumental variables can deal with endogenous regressors.
● What a good instrument is depends on the application.
● Showed that we can estimate an IV using the 2SLS estimator
● Important that the instrumental variable is exogenous.
● IV estimators are not unbiased, but are consistent.
● Bias can be severe if instrument is weak (low predictive power).
● IV estimates the average treatment effect for the compliers (LATE).
● Compliers are individuals who change treatment status when the instrument changes
value.
● Compliers cannot be identified directly.
● Biggest challenge is finding good instruments (exogenous and relevant).
● What a good instrument is depends on the application.
WEEK 3
● Randomized experiments ensure that causal effects are estimated.
● Variation in scale, and field or laboratory.
● However, design crucial, cannot correct mistakes in design ex-post.
● Balancing table way to check ex-post whether randomization was done correctly.
● Different alternatives for full randomization if that is complicated/unfeasible
● Power analysis computes required size of experiment.
● Possible complications for simple analysis: nonrandom selection, attrition,
noncompliance, externalities.
1
, ● Attrition is only problematic if related to potential outcomes (otherwise only
reduction of power).
● External validity (population, context, administration, equilibrium effects, Hawthorne
effect).
WEEK 4
● Regression discontinuity allows estimation of causal effects in cases where treatment
is endogenous
● Requires a discontinuous jump of treatment probability in the running variable
● If the probability of treatment jumps from 0 to 1, discontinuity is sharp, otherwise it is
fuzzy
● Important to check:
○ Specification of relationship between outcome and running variable
○ (may be non-linear)
○ Bunching (manipulating running variable)
○ Continuity of other covariates around the threshold
● Various models for considering dummy endogenous variables.
● Linear probability models are easy to analyze and easy to interpret.
● However, functional form may be inconvenient.
● Logit and Probit models guarantee that probabilities are bounded between 0 and 1.
● Interpretation of coefficients is not straightforward.
● Link to consumer choice: model choice based on latent utility (unobserved, only
observe chosen outcome)
WEEK 5
● Panel data describe observations of individuals/regions/firms/etc. over time.
● Panel data models can deal with unit specific effects and can solve a lot of omitted
variable bias problems.
● Fixed effects or random effects model: Random effects model more efficient but need
stronger assumptions.
● Usual panel data models assume strict exogeneity of regressors.
● Dealing with lagged-endogenous variables uses instrumental variable methods.
● Policy changes can be used for evaluation with observational data
● The before-after estimator compares outcomes before and after a policy is
implemented
● To correct for other things that change over time, subtract change in control group:
difference-in-differences estimator
● DD-estimator provides causal effect if common trend assumption holds
● Look at pre-trends and do placebo tests to investigate plausibility
● DD-estimate can be obtained by performing simple regression
2
,WEEK 6
● Heritabilities of social science outcomes is considerable
● Genetic markers influence social science outcomes through non-deterministic
pathways that are likely to difficult to disentangle
● Social science outcomes are likely influenced by a very large number of SNPs, each
with tiny effect sizes
● Large sample sizes are key to achieve well-powered analyses
● GWAS methodology in combination with increasingly large sample sizes have
resulted in the discovery of many genome-wide significant SNPs for social science
outcomes
● Causality of genetic effects found in current GWAS studies can be
● questioned
● Principal components control for population stratification, but imperfectly
● Better solutions are possible when genetic data of family members is included in the
analysis, but such data is still too scarce for use in GWAS
● Polygenic scores are a summary measure of genetic endowments at the individual
level, that are sufficiently predictive to be used in “regular” econometric analyses
3
, Week 1.1 - Instrumental variables
Difference between correlation and causality
Causality is about questions such as:
● What would have happened
● What would happen
→ This requires knowing about unobserved outcomes, because we only observe one potential
outcome.
E.g. ‘Do people earn more if they complete university education?’
Correlation is a measure for the association between two variables.
→ One approach would be to compare those with university education to those without
university education.
However, correlation between D and Y can be caused by
1. A causal effect of D on Y
2. A causal effect of Y on D
3. Omitted variables: Z affects both D and Y
Potential outcome model
= general model to think about causal effects.
● Assume a treatment or choice variable can take two values (0/1)
● Each individual i has two potential outcomes, Y1i* with treatment and Y0i* without
treatment.
● Only one potential outcome is observed (factual). The unobserved outcome is the
counterfactual outcome.
● For an individual the effect of participating in the treatment equals:
● ∆i is always an unobserved random variable, because only one of the random
variables Y1i* and Y0i* is observed. (= the fundamental problem of causal inference)
4