Introduction and overview of experimental design
Independent variable: the groups in an experiment are more or less comparable due to
randomization but we make a difference on one (or a few) relevant variables (e.g. age,
gender, …)
Dependent variable: we compare the group on a relevant outcome variable ( it is assumed
that this variable is continuous (with an approximate normal distribution) so we will be
comparing means
ANOVA: looking at differences between means
→ the only differences between 2 experimental groups are by chance (due to
randomization)
Good experimental design: use 2 groups (one is control) made by randomization
Benefits of good experimental design
- Isolates the treatment effect if interest from confounders
- reduces bias
- controls precision
- minimizes and quantifies random error or uncertainty
- simplifies and validates the analysis
- increases the external validity
external validity: is it possible to see the found effects of the experiment in a real-life setting
Studies with humans vs. non-humans
- Human responses to treatments and interventions tend to be more variable; the
investigator in experiments with humans cannot control as many sources of variability
through design as can be done in the lab
- Human experiments tend to need larger numbers of participants to control this
random variation
- Experiments with nonhuman subjects tend to involve fewer constraints (ethics,
consent, etc.)
- Not generally possible to recruit and observe all subjects in human studies
simultaneously, as might be done in nonhuman trials
- Some design differences, and tend to be longer studies
Randomized control trial (RCT): a special type of study mostly into the effect of a
certain drug/intervention → mostly in a regulatory context, with special rules
(ICH-E9)
Randomization tests: keep even closer to the general principle of randomization than
ANOVA
- nowadays, randomization studies are more often used (used to be very computer-
intensive) but ANOVA is still used a lot as it is easier and the outcomes are more or
less the same under general assumptions
Analysis of variance (ANOVA)
- basically a t-test
- comparing MEANS of more than two treatments/interventions
- null-hypothesis (population means amongst all groups are equal) needs to be
rejected
, - our hypothesis: (not all) population means are equal
With K=number of groups, N=number of measures (total, off all groups combined)
SS between: deviance of the treatment means around the overall mean → sum of
all estimated effects times the number of measures
SS within: error variance based on all the observation deviations from their appropriate
treatment means
SS total: total variance based on all the observation deviations from the grand mean
estimated effect:
F ratio: around 1 when there is no effect and bigger than 1 when there is an effect
→ between variance estimate needs to be bigger than within variance estimate
p-value: the probability of observing an F value greater than or equal to the one
obtained GIVEN that the null hypothesis is true → the smaller the p-value the
greater the support for rejecting the null hypothsis (and concluding that not all
population means are equal)
Reporting of the results
- try to avoid terms like ‘statistically significant’
- Estimate of effect: point estimate with direction and confidence interval (where
relevant) For ANOVAs when you have more than two groups but you could report
group means and use a method of multiple comparisons that produces confidence
intervals for these pairwise comparisons.
- Supporting statistics: test statistic (e.g. F-statistic for ANOVA), degrees of freedom
(e.g. between group df and within group df for ANOVA), and the P-value. The exact
P-value should be reported, unless the evidence is strong (i.e. P = 0.03 is good and
P < 0.001 is also acceptable)
, Three assumptions of ANOVA
- independance of errors: you assume that the outcomes of different people
in a group do not depend on each other → can be prevented a bit by
randomization
- equal error variance across treatment/groups (also known as homogeneity
of variance assumption) → the red line should be around zero except for
when there is a trechter vorm
- normality of errors → groups should be equally large to prevent this
- QQ plot is used to see if all errors combined form a normal distribution
ANCOVA
- extension of ANOVA to incorporate a continuous covariate (eg baseline)
- another way of reducing the noise term by accounting for individual differences that
are present
- use linear regression models to support the interpretation of the treatment effect
Independent variable: the groups in an experiment are more or less comparable due to
randomization but we make a difference on one (or a few) relevant variables (e.g. age,
gender, …)
Dependent variable: we compare the group on a relevant outcome variable ( it is assumed
that this variable is continuous (with an approximate normal distribution) so we will be
comparing means
ANOVA: looking at differences between means
→ the only differences between 2 experimental groups are by chance (due to
randomization)
Good experimental design: use 2 groups (one is control) made by randomization
Benefits of good experimental design
- Isolates the treatment effect if interest from confounders
- reduces bias
- controls precision
- minimizes and quantifies random error or uncertainty
- simplifies and validates the analysis
- increases the external validity
external validity: is it possible to see the found effects of the experiment in a real-life setting
Studies with humans vs. non-humans
- Human responses to treatments and interventions tend to be more variable; the
investigator in experiments with humans cannot control as many sources of variability
through design as can be done in the lab
- Human experiments tend to need larger numbers of participants to control this
random variation
- Experiments with nonhuman subjects tend to involve fewer constraints (ethics,
consent, etc.)
- Not generally possible to recruit and observe all subjects in human studies
simultaneously, as might be done in nonhuman trials
- Some design differences, and tend to be longer studies
Randomized control trial (RCT): a special type of study mostly into the effect of a
certain drug/intervention → mostly in a regulatory context, with special rules
(ICH-E9)
Randomization tests: keep even closer to the general principle of randomization than
ANOVA
- nowadays, randomization studies are more often used (used to be very computer-
intensive) but ANOVA is still used a lot as it is easier and the outcomes are more or
less the same under general assumptions
Analysis of variance (ANOVA)
- basically a t-test
- comparing MEANS of more than two treatments/interventions
- null-hypothesis (population means amongst all groups are equal) needs to be
rejected
, - our hypothesis: (not all) population means are equal
With K=number of groups, N=number of measures (total, off all groups combined)
SS between: deviance of the treatment means around the overall mean → sum of
all estimated effects times the number of measures
SS within: error variance based on all the observation deviations from their appropriate
treatment means
SS total: total variance based on all the observation deviations from the grand mean
estimated effect:
F ratio: around 1 when there is no effect and bigger than 1 when there is an effect
→ between variance estimate needs to be bigger than within variance estimate
p-value: the probability of observing an F value greater than or equal to the one
obtained GIVEN that the null hypothesis is true → the smaller the p-value the
greater the support for rejecting the null hypothsis (and concluding that not all
population means are equal)
Reporting of the results
- try to avoid terms like ‘statistically significant’
- Estimate of effect: point estimate with direction and confidence interval (where
relevant) For ANOVAs when you have more than two groups but you could report
group means and use a method of multiple comparisons that produces confidence
intervals for these pairwise comparisons.
- Supporting statistics: test statistic (e.g. F-statistic for ANOVA), degrees of freedom
(e.g. between group df and within group df for ANOVA), and the P-value. The exact
P-value should be reported, unless the evidence is strong (i.e. P = 0.03 is good and
P < 0.001 is also acceptable)
, Three assumptions of ANOVA
- independance of errors: you assume that the outcomes of different people
in a group do not depend on each other → can be prevented a bit by
randomization
- equal error variance across treatment/groups (also known as homogeneity
of variance assumption) → the red line should be around zero except for
when there is a trechter vorm
- normality of errors → groups should be equally large to prevent this
- QQ plot is used to see if all errors combined form a normal distribution
ANCOVA
- extension of ANOVA to incorporate a continuous covariate (eg baseline)
- another way of reducing the noise term by accounting for individual differences that
are present
- use linear regression models to support the interpretation of the treatment effect