Causal analysis techniques (CAT)
Lecture 1
We expect you have working knowledge about:
• Descriptive statistics
• Inferential statistics (Null hypothesis significance testing) (chapters 8 and 9 of Warner
I)
Course logistics: content (you will learn how to use them and when)
• One-way between-subjects analysis of Variance (ANOVA)
• Pearson’s (Partial) correlation coefficient
• Bivariate regression
• Multiple regression
• Elaboration logic
• Path analysis
• Logistic regression analysis
These techniques will help you answer what and why research questions.
They have in common:
- Identify systematic dependencies between variables
- They might include systematic errors/residuals due to unmeasured causes
- They will include random errors/residuals due to random fluctuation (variation that
we can’t explain or don’t see) (Variation can be systematics (when we take other
information in account), but there is also random error (it is social science we can’t
always explain everything, and not all the people are the same).
They are distinguished by:
- Measurement levels of the dependent variables
- Measurement levels of the independent variables
- Number of variables (complexity of underlying theory)
Dependent variable (Y)
Independent variables (X) Quantitative Qualitative
(Interval/ratio) (Nominal)
Small number (1 or 2) ANOVA Table-analysis
Qualitative (Few categorical X, one Y) (not part of this course)
Any number Bivariate regression Logistic regression
Qualitative and/or (One X, one Y) (Bivariate or multiple)
quantitative Multiple regression
(Many X, one Y)
Path analysis
(Mixing X and Y)
One-way between-subjects
Analysis of variance (ANOVA) (Warner I: chapter 13 one-way between subjects analysis of
variance)
In this lecture:
• In which situation ANOVA is applicable
, • What hypotheses can be tested with ANOVA
• The key logic behind ANOVA
• How to calculate the deviations from the mean(s)
Logic of ANOVA
Substantive hypothesis:
A person’s degree of organizational commitment (Y) depends on the team in which the
person works (X) (You use it when you want to answer questions like this)
Team in which someone works (X) organizational commitment (Y)
Y is most likely the dependent variable.
• Question: if the hypothesis is correct, what would you expect to find with regard to
differences in average commitment between the teams?
• Imagine that we have collected data of measurements of organizational commitment
for 3 teams.
• 2 scenarios with regard to the data…
Teams are categorical. (If it is more than 2 or it can be nominal).
Logic of ANOVA
Scenario 1 Scenario 2
Team 1 Team2 Team3 Team1 Team2 Team3
0 7 1 2 9 4
1 8 2 2 10 5
3 9 5 3 10 5
4 13 8 3 11 6
6 15 11 4 12 7
Average 2.8 10.4 5.4 Average 2.8 10.4 5.4
Variance 5.7 11.8 17.3 Variance 0.7 1.3 1.3
In which of the data scenarios would you be more inclined to conclude that there is a
connection between the team in which someone works and organizational commitment?
There is a different in variances. (differences in means).
The variances is a differences in variations. (how wide is the distribution).
We are more confident what there is a significant difference in scenario 2 than 1. The
differences that we see is a significant difference.
,Key idea of ANOVA is:
When there are 2 or more groups, can we make a statement about possible -significant-
differences between the mean scores of the groups?
What could we do if there were only 2 groups?: T-test
Fundamental principle of ANOVA:
ANOVA analyses the ratio of the two components of total variance in data: between-group
variance and within-group variance
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑐𝑜𝑟𝑒𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠 𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
Evendoch the mean differences is the same, but the variances is difference.
Fundamental principles of ANOVA:
ANOVA analyses ratio in which
, Between-group variances measures systematic differences between and all other variables
that influence Y, either systematically or randomly (‘residual variance’ or ‘error’)
and
Within-group variance measures influence of all other variables that influence Y either
systematically or randomly (‘residual variance’ or ‘error’).
Important to realize:
1. Any differences within a group cannot be due to differences between the groups
because everyone in a particular groups has the same groups socre; so within-group
differences must be due to systematic unmeasured factos (e.g., individuals
differences) or random measurement error
2. Any observed differences between groups are probably not only pure between-group
differences, but also differences due to systematic unmeasured factors or random
measurement error
Compare…
Between-group variability (=systematic group effect + error)
to
Within-group variability (=error)
…to learn about the size of the systematic group effect
Within the group there is some variation, within the group there can’t be a systematic
differences.
(Theory: the independent variable makes the outcomes of the dependent variable, but in
reality there is a lot that can go wrong).
Statistical null hypothesis of one-way between-subjects ANOVA:
Mean scores of k populations corresponding to the groups in the study are all equal to each
other:
Lecture 1
We expect you have working knowledge about:
• Descriptive statistics
• Inferential statistics (Null hypothesis significance testing) (chapters 8 and 9 of Warner
I)
Course logistics: content (you will learn how to use them and when)
• One-way between-subjects analysis of Variance (ANOVA)
• Pearson’s (Partial) correlation coefficient
• Bivariate regression
• Multiple regression
• Elaboration logic
• Path analysis
• Logistic regression analysis
These techniques will help you answer what and why research questions.
They have in common:
- Identify systematic dependencies between variables
- They might include systematic errors/residuals due to unmeasured causes
- They will include random errors/residuals due to random fluctuation (variation that
we can’t explain or don’t see) (Variation can be systematics (when we take other
information in account), but there is also random error (it is social science we can’t
always explain everything, and not all the people are the same).
They are distinguished by:
- Measurement levels of the dependent variables
- Measurement levels of the independent variables
- Number of variables (complexity of underlying theory)
Dependent variable (Y)
Independent variables (X) Quantitative Qualitative
(Interval/ratio) (Nominal)
Small number (1 or 2) ANOVA Table-analysis
Qualitative (Few categorical X, one Y) (not part of this course)
Any number Bivariate regression Logistic regression
Qualitative and/or (One X, one Y) (Bivariate or multiple)
quantitative Multiple regression
(Many X, one Y)
Path analysis
(Mixing X and Y)
One-way between-subjects
Analysis of variance (ANOVA) (Warner I: chapter 13 one-way between subjects analysis of
variance)
In this lecture:
• In which situation ANOVA is applicable
, • What hypotheses can be tested with ANOVA
• The key logic behind ANOVA
• How to calculate the deviations from the mean(s)
Logic of ANOVA
Substantive hypothesis:
A person’s degree of organizational commitment (Y) depends on the team in which the
person works (X) (You use it when you want to answer questions like this)
Team in which someone works (X) organizational commitment (Y)
Y is most likely the dependent variable.
• Question: if the hypothesis is correct, what would you expect to find with regard to
differences in average commitment between the teams?
• Imagine that we have collected data of measurements of organizational commitment
for 3 teams.
• 2 scenarios with regard to the data…
Teams are categorical. (If it is more than 2 or it can be nominal).
Logic of ANOVA
Scenario 1 Scenario 2
Team 1 Team2 Team3 Team1 Team2 Team3
0 7 1 2 9 4
1 8 2 2 10 5
3 9 5 3 10 5
4 13 8 3 11 6
6 15 11 4 12 7
Average 2.8 10.4 5.4 Average 2.8 10.4 5.4
Variance 5.7 11.8 17.3 Variance 0.7 1.3 1.3
In which of the data scenarios would you be more inclined to conclude that there is a
connection between the team in which someone works and organizational commitment?
There is a different in variances. (differences in means).
The variances is a differences in variations. (how wide is the distribution).
We are more confident what there is a significant difference in scenario 2 than 1. The
differences that we see is a significant difference.
,Key idea of ANOVA is:
When there are 2 or more groups, can we make a statement about possible -significant-
differences between the mean scores of the groups?
What could we do if there were only 2 groups?: T-test
Fundamental principle of ANOVA:
ANOVA analyses the ratio of the two components of total variance in data: between-group
variance and within-group variance
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑐𝑜𝑟𝑒𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠 𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
Evendoch the mean differences is the same, but the variances is difference.
Fundamental principles of ANOVA:
ANOVA analyses ratio in which
, Between-group variances measures systematic differences between and all other variables
that influence Y, either systematically or randomly (‘residual variance’ or ‘error’)
and
Within-group variance measures influence of all other variables that influence Y either
systematically or randomly (‘residual variance’ or ‘error’).
Important to realize:
1. Any differences within a group cannot be due to differences between the groups
because everyone in a particular groups has the same groups socre; so within-group
differences must be due to systematic unmeasured factos (e.g., individuals
differences) or random measurement error
2. Any observed differences between groups are probably not only pure between-group
differences, but also differences due to systematic unmeasured factors or random
measurement error
Compare…
Between-group variability (=systematic group effect + error)
to
Within-group variability (=error)
…to learn about the size of the systematic group effect
Within the group there is some variation, within the group there can’t be a systematic
differences.
(Theory: the independent variable makes the outcomes of the dependent variable, but in
reality there is a lot that can go wrong).
Statistical null hypothesis of one-way between-subjects ANOVA:
Mean scores of k populations corresponding to the groups in the study are all equal to each
other: