Testing the assumptions of ANOVA:
Overview of the ANOVA model:
ANOVA is a liner model used to decompose each observation into:
• A global mean: the overall average across the groups.
• A group effect: how much group j’s mean differs.from the global mean.
• A random error: individual variation within the group.
Observation = overall average + group effect + random noise
The goal of ANOVA is to test whether:
• All the group affects (αj) = 0 (no group difference).
• At least 1 group effect is ≠ 0 (at least 1 group mean differs).
This leads to decomposing total sum of squares (SST) - total variation of
the data - into:
• Sum of squares within-groups (SSW / SSE): ……………………………….
sum of variation within each group.
• Sum of squares between-groups (SSB): ……………………………….…..
sum of variation between the group means & the global mean.
Finally, the test statistic is:
However, which assumptions are required such that the test statistic is an
Fk - 1, n - k distribution?
2 Assumptions necessary for ANOVA:
1. We assume that the response variable from each group (X | G) is
normally distributed.
2. Homoscedasticity: The populating variances of the groups are equal
( 0 02 0 02).
3. The observations are random & independent (an observation of X
should not depend on any other observation in the sample).
, We would like to test these assumptions. However, note that not all
assumptions are equally important.
The F-test usually still works even if the data is not perfectly normal.
• This implies that a violation of the normality assumption is not a big
deal, and ANOVA will still give a credible solution.
However, if there are large deviations from normality, especially in the
case of small group sample sizes, then the results may not be reliable.
• In this case, we can try changing the data with a transformation (to
make the data normally distributed), or use an appropriate non-
parametric test.
Non-parametric tests are tests that don’t assume that the data follows
a certain distribution (like the normal distribution).
• They are called “roughly equivalent” because they answer the same
question as ANOVA (are the group means different?) but in a
different way.
• The non-parametric equivalent of the one-factor ANOVA is called
Kruskal-Wallis test.
We will test the normality assumption using any of the previously
developed normality tests (eg. Lilliefors).
• For this, each group has to be tested for normality separately, and all
groups have to be normally distributed for the assumption to hold.
The F-test is much more sensitive to violations in the equal variances
assumption.
• If the groups are the same size, small differences in variance don’t matter
much.
• If the groups are different sizes, unequal variances may lead to unreliable
results.
This assumption is usually tested using Bartlett’s test for
homogeneity.
• If the assumption fails, transformation of variables can be applied.
Overview of the ANOVA model:
ANOVA is a liner model used to decompose each observation into:
• A global mean: the overall average across the groups.
• A group effect: how much group j’s mean differs.from the global mean.
• A random error: individual variation within the group.
Observation = overall average + group effect + random noise
The goal of ANOVA is to test whether:
• All the group affects (αj) = 0 (no group difference).
• At least 1 group effect is ≠ 0 (at least 1 group mean differs).
This leads to decomposing total sum of squares (SST) - total variation of
the data - into:
• Sum of squares within-groups (SSW / SSE): ……………………………….
sum of variation within each group.
• Sum of squares between-groups (SSB): ……………………………….…..
sum of variation between the group means & the global mean.
Finally, the test statistic is:
However, which assumptions are required such that the test statistic is an
Fk - 1, n - k distribution?
2 Assumptions necessary for ANOVA:
1. We assume that the response variable from each group (X | G) is
normally distributed.
2. Homoscedasticity: The populating variances of the groups are equal
( 0 02 0 02).
3. The observations are random & independent (an observation of X
should not depend on any other observation in the sample).
, We would like to test these assumptions. However, note that not all
assumptions are equally important.
The F-test usually still works even if the data is not perfectly normal.
• This implies that a violation of the normality assumption is not a big
deal, and ANOVA will still give a credible solution.
However, if there are large deviations from normality, especially in the
case of small group sample sizes, then the results may not be reliable.
• In this case, we can try changing the data with a transformation (to
make the data normally distributed), or use an appropriate non-
parametric test.
Non-parametric tests are tests that don’t assume that the data follows
a certain distribution (like the normal distribution).
• They are called “roughly equivalent” because they answer the same
question as ANOVA (are the group means different?) but in a
different way.
• The non-parametric equivalent of the one-factor ANOVA is called
Kruskal-Wallis test.
We will test the normality assumption using any of the previously
developed normality tests (eg. Lilliefors).
• For this, each group has to be tested for normality separately, and all
groups have to be normally distributed for the assumption to hold.
The F-test is much more sensitive to violations in the equal variances
assumption.
• If the groups are the same size, small differences in variance don’t matter
much.
• If the groups are different sizes, unequal variances may lead to unreliable
results.
This assumption is usually tested using Bartlett’s test for
homogeneity.
• If the assumption fails, transformation of variables can be applied.