W1.1
ANOVA
Ch14.1
Analysis of variance/ANOVA: The inferential method for comparing means of several groups
- Allows a comparison of more than two groups at the same time to determine whether a
relationship exists between them
- F statistic/F-ratio: The result of the ANOVA formula – Allows for the analysis of
multiple groups of data to determine the variability between samples and within
samples
- Factors: Categorical explanatory variables in multiple regression and in ANOVA
- If groups truly differ, the between-group variability must be larger than the within-
group variability
ANOVA SIGNIFICANCE TEST
1. Assumptions:
- Applicable in cases of a categorical explanatory variable and a quantitative
response variable – The explanatory variable should have at least 3 groups
- The population distribution of the response variable for the g groups are
approximately normal – Not too important when the sample sizes are large
- The same standard deviation for each group
- Independent random samples
- Group sizes are equal
2. Hypotheses:
- H 0 : μ1=μ2=…=μ g
- H a : at least two of the population means are different
3. Test statistic:
↓
between−groups variability between−groups estimate of σ 2
F= =
within−groups variability within−groups estimate of σ
2
↓
2 2 2
s 1 + s 2+ … s g
Within-groups variability =
g
Between-groups variability = [ 1
n ( y − y ) + ( y 2− y ) +⋯ ( y q− y ) ]
2 2 2
g−1
4. P-value: 1-F.DIST(F-score; ⅆ f 1; ⅆ f 2; TRUE)
↓
Degrees of freedom:
- ⅆf 1 = g - 1
- ⅆ f 2 = N – g → N = total number of subjects
5. Conclusion: The smaller the P-value, the more unusual the sample data is, the stronger
the evidence against H 0, and the stronger the evidence in favour of H a
, Source df SS MS F P
Group ⅆf 1 M S group × ⅆ f 1 Within-groups Ratio of the MS P-value
estimate values
Error ⅆf 2 MS error × ⅆ f 2 Between-groups
estimate
Total ⅆf 1 + ⅆf 2 Between-groups
SS + Within-
groups SS
Within-groups estimate = ERROR
- Unbiased: The sampling distribution has σ 2 as its mean, regardless of whether or not
H 0 is true
Between-groups estimate = GROUP
- Unbiased only when H 0 is true → When H 0 is false, the between-groups estimate
tends to overestimate σ 2
Variance = standard deviation 2
Cut-off value: F.INV(1-alpha; ⅆ f 1; ⅆ f 2)
Ch14.2 ESTIMATING DIFFERENCES IN GROUPS FOR A SINGLE FACTOR
FISHER METHOD
The ANOVA significance test does not express which groups are different and how different
they are – Confidence intervals can estimate differences
Assumptions for post-hoc confidence intervals:
1. Normal population distributions
2. Identical standard deviations
3. Data that resulted from randomisation
y i− y j ± t ⋅ SE → yi − y j ± t 0.025 ⋅ s
√ 1 1
+
ni n j
√
2 2 2
s1 + s2 +… s g
s= = Square root of the within-groups variance
g
SE = s
√ 1 1
+
ni n j
T = T.INV(0,975; df = N – g = ⅆ f 2) for a 95% CI
ANOVA
Ch14.1
Analysis of variance/ANOVA: The inferential method for comparing means of several groups
- Allows a comparison of more than two groups at the same time to determine whether a
relationship exists between them
- F statistic/F-ratio: The result of the ANOVA formula – Allows for the analysis of
multiple groups of data to determine the variability between samples and within
samples
- Factors: Categorical explanatory variables in multiple regression and in ANOVA
- If groups truly differ, the between-group variability must be larger than the within-
group variability
ANOVA SIGNIFICANCE TEST
1. Assumptions:
- Applicable in cases of a categorical explanatory variable and a quantitative
response variable – The explanatory variable should have at least 3 groups
- The population distribution of the response variable for the g groups are
approximately normal – Not too important when the sample sizes are large
- The same standard deviation for each group
- Independent random samples
- Group sizes are equal
2. Hypotheses:
- H 0 : μ1=μ2=…=μ g
- H a : at least two of the population means are different
3. Test statistic:
↓
between−groups variability between−groups estimate of σ 2
F= =
within−groups variability within−groups estimate of σ
2
↓
2 2 2
s 1 + s 2+ … s g
Within-groups variability =
g
Between-groups variability = [ 1
n ( y − y ) + ( y 2− y ) +⋯ ( y q− y ) ]
2 2 2
g−1
4. P-value: 1-F.DIST(F-score; ⅆ f 1; ⅆ f 2; TRUE)
↓
Degrees of freedom:
- ⅆf 1 = g - 1
- ⅆ f 2 = N – g → N = total number of subjects
5. Conclusion: The smaller the P-value, the more unusual the sample data is, the stronger
the evidence against H 0, and the stronger the evidence in favour of H a
, Source df SS MS F P
Group ⅆf 1 M S group × ⅆ f 1 Within-groups Ratio of the MS P-value
estimate values
Error ⅆf 2 MS error × ⅆ f 2 Between-groups
estimate
Total ⅆf 1 + ⅆf 2 Between-groups
SS + Within-
groups SS
Within-groups estimate = ERROR
- Unbiased: The sampling distribution has σ 2 as its mean, regardless of whether or not
H 0 is true
Between-groups estimate = GROUP
- Unbiased only when H 0 is true → When H 0 is false, the between-groups estimate
tends to overestimate σ 2
Variance = standard deviation 2
Cut-off value: F.INV(1-alpha; ⅆ f 1; ⅆ f 2)
Ch14.2 ESTIMATING DIFFERENCES IN GROUPS FOR A SINGLE FACTOR
FISHER METHOD
The ANOVA significance test does not express which groups are different and how different
they are – Confidence intervals can estimate differences
Assumptions for post-hoc confidence intervals:
1. Normal population distributions
2. Identical standard deviations
3. Data that resulted from randomisation
y i− y j ± t ⋅ SE → yi − y j ± t 0.025 ⋅ s
√ 1 1
+
ni n j
√
2 2 2
s1 + s2 +… s g
s= = Square root of the within-groups variance
g
SE = s
√ 1 1
+
ni n j
T = T.INV(0,975; df = N – g = ⅆ f 2) for a 95% CI