Week 1 – lecture 1 +2 – ANOVA I
ANOVA
= Interested in mean differences between groups
Tests whether the mean of a continuous outcome differs across 2+ independent groups
of one categorical factor.
Example:
Substantive hypothesis:
A person’s degree of organizational commitment (Y) depends on the team in which the
person works (X)
Team in which someone works (X) Organizational commitment (Y)
- a categorical variable (independent) combined with a continuous variable
(dependent)
Q: if the substantive hypothesis is correct, what would you expect to find with regard to
differences in average commitment between the teams? → at least one team’s average
commitment differs significantly from another
- Imagine that we have collected data of measurements of organizational
commitment for 3 teams
- For now, we have 2 scenarios with regard to the data
In this example, you would be more inclined to conclude there is a connection between
the team someone works in and their organizational commitment in Scenario 2, even
though the averages are the same → you also need to look at the variance
Key idea of ANOVA:
When there are 2 or more groups, we can make a statement about possible -significant-
differences between the mean scores of the groups
- What could we do if there were only 2 groups? → (Independent sample) t-test
- ANOVA is a more general matter of a t-test
,ANOVA compares:
- Between-group variance → how much the group means differ from each other.
- Within-group variance → how much individual scores vary within each group.
o Within groups the variance cannot be due to group-membership,
because all members belong to the same group, or got the same
experimental treatment.
If the between-group variance is large relative to the
within-group variance, the F-statistic will be large →
more evidence that group means differ significantly.
Fundamental principle of ANOVA
ANOVA analyses the ratio (= one number divided by
another to compare their sizes) of the two components
of total variance in data:
between-group variance and within-group variance
Information on variance of average scores between groups (systematic information)
information on variance of scores within-groups (non-systematic information)
Both scenarios have the same team means (≈ 2.8, 10.4, 5.4), so the between-group
variance is identical.
What differs is the within-group variance:
- Scenario 1 (top): big spreads inside teams (variances 5.7, 11.8, 17.3). Even
though means differ, the data are noisy → F-statistic smaller
- Scenario 2 (bottom): tight clusters within teams (variances 0.7, 1.3, 1.3). Same
mean gaps, much less noise → F-statistic larger → stronger evidence that a
larger part of the variation in organizational commitment (Y) can be attributed to
which team (X) someone is in
ANOVA analyses ratio in which:
- between-group variance measures systematic differences between groups
and all other variables that influence Y, either systematically or randomly
(‘residual variance’ or ‘error’)
→ Systematic differences (variance explained by factor X) + error
and
- within-group variance measures influence of all other variables that
influence Y either systematically or randomly (‘residual variance’ or ‘error’)
→ Error (variance not explained by factor X)
,Important to realize:
1. Any differences within a group cannot be due to differences between the groups
because everyone in a particular group has the same group score; so, within-
group differences must be due to systematic unmeasured factors (e.g.,
individual differences as age, personality, intelligence) or random measurement
error
a. Systematic = difference we can link to some known factor
2. Any observed differences between groups are probably not only pure between-
group differences, but also differences due to systematic unmeasured factors or
random measurement error
So, compare between-group variability (= systematic group effect + error)
to within-group variability (= error → we don’t know where the difference lies)
… to learn about the size of the systematic group effect → how
strong is it?
Statistical null hypothesis
of One-Way Between-Subjects ANOVA:
Mean scores of k populations corresponding to the groups in the study are all equal to
each other:
H0: There are no mean differences. We want to reject the H0 to show that they do
significantly differ.
Only one of them has to be different in order to reject the H0.
- When AT LEAST ONE mean is significantly different from the other means (the F-
test does not show where this difference exists) → Exam
Why prefer One-Way Between-Subjects ANOVA instead of separate t-tests
for means? (Warner I, p. 390)
- In our example with 3 teams, we could also conduct 3 separate t-
tests for means:
Problem of this approach: The larger the number of tests that is applied to
a dataset, the larger the chance of rejecting the null hypothesis while it is
correct (Type I error)
- Why? Follows from logic of hypothesis testing: we reject the null
hypothesis if a result is exceptional, but the more tests we conduct, the
easier it is to find an exceptional result
- One will easier make the mistake of concluding that there is an effect,
while there is not
- This is called: ‘inflated risk of Type I errors’ (Warner I, p. 390)
→ the type I error of 0.05 accumulates → multiple testing inflates our type I error
→ so, the chance is not 0.05, but after three tests it is 0.143 to make a type I error
, Solution → One single omnibus (one big test) test for the null hypothesis that the
means of K population are equal, with chance of Type I error = .05
= One-Way ANOVA
F-statistics
→ If we want to test the statistical null hypothesis with an ANOVA, the F-distribution is
used
- In order to determine if a specific sample result is exceptional (‘significant’)
under the assumption that the statistical null hypothesis is correct, the
test-statistic F must be calculated
Calculations: → previous example (scenario 1)
Step 1: calculate the group mean from the grand mean (between-group) →
1.1 calculate the grand mean of the entire data → (2.8 + 10.4 + 5.4)/3 = 6.2
1.2 calculate the group mean of each group
- Between the means → there is a difference in means
- Systematic differences
- Is it significant? → can we generalize the finding to the population?
- 𝛼𝑖 denotes the ‘effect of group 𝑖’ (do not confuse with significance level!)
Group/team 1 Group/team 2 Group/team 3
2.8 – 6.2 = -3.4 10.4 – 6.2 = 4.2 5.4 – 6.2 = -0.8
Step 2: calculate the deviation score of every individual from the group mean→
- Variation within the groups can’t be explained → unmeasured cofounding
→ non-systematic information that we have
- Within-group
Group/team 1 Group/team 2 Group/team 3
0 – 2.8 = -2.8 7 – 10.4 = -3.4 1 – 5.4 = -4.4
1 – 2.8 = -1.8 8 – 10.4 = -2.4 2 – 5.4 = -3.4
3 – 2.8 = 0.2 9 – 10.4 = -1.4 5 – 5.4 = -0.4
4 – 2.8 = 1.2 13 – 10.4 = 2.6 8 – 5.4 = 3.6
6 – 2.8 = 3.2 15 – 10.4 = 4.6 11 – 5.4 = 5.6
Step 3: calculate the deviation score of every individual from the grand mean →
Group/team 1 Group/team 2 Group/team 3
0 – 6.2 = -6.2 7 – 6.2 = 0.8 1 – 6.2 = -5.2
1 – 6.2 = -5.2 8 – 6.2 = 1.8 2 – 6.2 = -4.2
3 – 6.2= -3.2 9 – 6.2 = 2.8 5 – 6.2 = -1.2
4 – 6.2 = -2.2 13 – 6.2 = 6.8 8 – 6.2 = 1.8
6 – 6.2 = -0.2 15 – 6.2 = 8.8 11 – 6.2 = 4.8
ANOVA
= Interested in mean differences between groups
Tests whether the mean of a continuous outcome differs across 2+ independent groups
of one categorical factor.
Example:
Substantive hypothesis:
A person’s degree of organizational commitment (Y) depends on the team in which the
person works (X)
Team in which someone works (X) Organizational commitment (Y)
- a categorical variable (independent) combined with a continuous variable
(dependent)
Q: if the substantive hypothesis is correct, what would you expect to find with regard to
differences in average commitment between the teams? → at least one team’s average
commitment differs significantly from another
- Imagine that we have collected data of measurements of organizational
commitment for 3 teams
- For now, we have 2 scenarios with regard to the data
In this example, you would be more inclined to conclude there is a connection between
the team someone works in and their organizational commitment in Scenario 2, even
though the averages are the same → you also need to look at the variance
Key idea of ANOVA:
When there are 2 or more groups, we can make a statement about possible -significant-
differences between the mean scores of the groups
- What could we do if there were only 2 groups? → (Independent sample) t-test
- ANOVA is a more general matter of a t-test
,ANOVA compares:
- Between-group variance → how much the group means differ from each other.
- Within-group variance → how much individual scores vary within each group.
o Within groups the variance cannot be due to group-membership,
because all members belong to the same group, or got the same
experimental treatment.
If the between-group variance is large relative to the
within-group variance, the F-statistic will be large →
more evidence that group means differ significantly.
Fundamental principle of ANOVA
ANOVA analyses the ratio (= one number divided by
another to compare their sizes) of the two components
of total variance in data:
between-group variance and within-group variance
Information on variance of average scores between groups (systematic information)
information on variance of scores within-groups (non-systematic information)
Both scenarios have the same team means (≈ 2.8, 10.4, 5.4), so the between-group
variance is identical.
What differs is the within-group variance:
- Scenario 1 (top): big spreads inside teams (variances 5.7, 11.8, 17.3). Even
though means differ, the data are noisy → F-statistic smaller
- Scenario 2 (bottom): tight clusters within teams (variances 0.7, 1.3, 1.3). Same
mean gaps, much less noise → F-statistic larger → stronger evidence that a
larger part of the variation in organizational commitment (Y) can be attributed to
which team (X) someone is in
ANOVA analyses ratio in which:
- between-group variance measures systematic differences between groups
and all other variables that influence Y, either systematically or randomly
(‘residual variance’ or ‘error’)
→ Systematic differences (variance explained by factor X) + error
and
- within-group variance measures influence of all other variables that
influence Y either systematically or randomly (‘residual variance’ or ‘error’)
→ Error (variance not explained by factor X)
,Important to realize:
1. Any differences within a group cannot be due to differences between the groups
because everyone in a particular group has the same group score; so, within-
group differences must be due to systematic unmeasured factors (e.g.,
individual differences as age, personality, intelligence) or random measurement
error
a. Systematic = difference we can link to some known factor
2. Any observed differences between groups are probably not only pure between-
group differences, but also differences due to systematic unmeasured factors or
random measurement error
So, compare between-group variability (= systematic group effect + error)
to within-group variability (= error → we don’t know where the difference lies)
… to learn about the size of the systematic group effect → how
strong is it?
Statistical null hypothesis
of One-Way Between-Subjects ANOVA:
Mean scores of k populations corresponding to the groups in the study are all equal to
each other:
H0: There are no mean differences. We want to reject the H0 to show that they do
significantly differ.
Only one of them has to be different in order to reject the H0.
- When AT LEAST ONE mean is significantly different from the other means (the F-
test does not show where this difference exists) → Exam
Why prefer One-Way Between-Subjects ANOVA instead of separate t-tests
for means? (Warner I, p. 390)
- In our example with 3 teams, we could also conduct 3 separate t-
tests for means:
Problem of this approach: The larger the number of tests that is applied to
a dataset, the larger the chance of rejecting the null hypothesis while it is
correct (Type I error)
- Why? Follows from logic of hypothesis testing: we reject the null
hypothesis if a result is exceptional, but the more tests we conduct, the
easier it is to find an exceptional result
- One will easier make the mistake of concluding that there is an effect,
while there is not
- This is called: ‘inflated risk of Type I errors’ (Warner I, p. 390)
→ the type I error of 0.05 accumulates → multiple testing inflates our type I error
→ so, the chance is not 0.05, but after three tests it is 0.143 to make a type I error
, Solution → One single omnibus (one big test) test for the null hypothesis that the
means of K population are equal, with chance of Type I error = .05
= One-Way ANOVA
F-statistics
→ If we want to test the statistical null hypothesis with an ANOVA, the F-distribution is
used
- In order to determine if a specific sample result is exceptional (‘significant’)
under the assumption that the statistical null hypothesis is correct, the
test-statistic F must be calculated
Calculations: → previous example (scenario 1)
Step 1: calculate the group mean from the grand mean (between-group) →
1.1 calculate the grand mean of the entire data → (2.8 + 10.4 + 5.4)/3 = 6.2
1.2 calculate the group mean of each group
- Between the means → there is a difference in means
- Systematic differences
- Is it significant? → can we generalize the finding to the population?
- 𝛼𝑖 denotes the ‘effect of group 𝑖’ (do not confuse with significance level!)
Group/team 1 Group/team 2 Group/team 3
2.8 – 6.2 = -3.4 10.4 – 6.2 = 4.2 5.4 – 6.2 = -0.8
Step 2: calculate the deviation score of every individual from the group mean→
- Variation within the groups can’t be explained → unmeasured cofounding
→ non-systematic information that we have
- Within-group
Group/team 1 Group/team 2 Group/team 3
0 – 2.8 = -2.8 7 – 10.4 = -3.4 1 – 5.4 = -4.4
1 – 2.8 = -1.8 8 – 10.4 = -2.4 2 – 5.4 = -3.4
3 – 2.8 = 0.2 9 – 10.4 = -1.4 5 – 5.4 = -0.4
4 – 2.8 = 1.2 13 – 10.4 = 2.6 8 – 5.4 = 3.6
6 – 2.8 = 3.2 15 – 10.4 = 4.6 11 – 5.4 = 5.6
Step 3: calculate the deviation score of every individual from the grand mean →
Group/team 1 Group/team 2 Group/team 3
0 – 6.2 = -6.2 7 – 6.2 = 0.8 1 – 6.2 = -5.2
1 – 6.2 = -5.2 8 – 6.2 = 1.8 2 – 6.2 = -4.2
3 – 6.2= -3.2 9 – 6.2 = 2.8 5 – 6.2 = -1.2
4 – 6.2 = -2.2 13 – 6.2 = 6.8 8 – 6.2 = 1.8
6 – 6.2 = -0.2 15 – 6.2 = 8.8 11 – 6.2 = 4.8