(0) Fundamentals in Stats1
● Sampling from a population → simple random, systematic, stratified, cluster, convenience
etc
● Descriptive statistics: summarise sample or population data with numbers, tables and graphs
● Inferential statistics: make predictions about population parameters, based on sample data
(1) Associations and Causality
Spurious association: a connection between two variables that appears to be causal but is not
● Correlation ≠ causation
1.1 Criteria for Establishing Causality
John Stuart Mill (1943)
We can only argue that B is caused by A if…
1. There is a relationship between A and B (association)
2. B must take place after A (appropriate time order)
3. The association between A and B is not explained by other factors (elimination of alternative
explanations)
1.1.1 Eliminating Alternative Explanations
→ controlling for other variables, eliminate its effects
Experimental Control: in research design
● Randomised Controlled Trial (RCT) is often considered the gold standard
○ Time-order manipulated (criteria 2)
○ Alternative explanations (partially) excluded through randomisation (criteria 3)
○ Both observable and non-observable characteristics must be equal
Statistical Control: in data-analysis strategy
● Option 1: examine X-Y relationship within subgroups (based on other variables)
→ often unrealistic
● Option 2: include alternative explanations in statistical model
1.2 Multivariate Associations
Involves evaluating multiple variables (more than two) to identify any possible association
among them
● Important to recognise relevant alternative explanations → know your theory
, ● Adjust your statistical analyses and interpretation accordingly → know your statistics
→ to avoid biassed results due to lurking variables
Types of Multivariate Associations
1.2.1 Spurious Associations
When both variables are also related to a third variable and the association between X
and Y disappears (mostly) when controlling for the third variable
Eg the association between height and maths skills is fully explained by school grade
grade level
+ +
height x math skills
→ Consequently, estimated association between variables can change dramatically depending on
the data analysis strategy