Applied Data Analysis
Samenvatting
Intro lecture
After succesful completion of this course, you are expected to be able to:
- Recognize the main types of experimental and observational study design
- Choose the appropriate method of data analysis given the study design and type of
variables
- Prepare a protocol for data analysis
- Perform basic data analysis and interpret the results in a context of human intervention
trials and observational studies
- Quickly learn new data-analysis skills, which can be applied during thesis and research
- Understand the principles of calculation of sample size and study power and are able to
conduct these calculations for basic study designs
- Understand how stratification and regression analysis can be used to adjust for
confounding
- Understand the principles and procedures of energy-adjustment and is able to adjust
for energy using different methods.
Course is divided in ten topics:
1. SPSS
2. Practical modules
3. ANOVA
4. Analysis plan
5. Log-transformation and non-parametric tests
6. Logistic regression
7. Literature discussion
8. Sample size
9. Confounding
10.Energy adjustment
Lecture ANOVA
Intervention study designs:
- Parallel intervention study with more than two treatment arms
- Intervention study including baseline measurements
- 2x2 factorial design
- Repeated measures design
Parallel intervention study with more than two treatment arms
- One unexposed group, two exposed groups
Use:
- When you are interested in two different treatments for the same endpoint compared to
a placebo
Analyse:
- One-way ANOVA
o One continuous outcome (= dependent variable)
o One discrete exposure variable (= independent variable)
- H0 : μ1 = μ2 = μ3 (population means are equal)
Ha : at least one of the population means differs from the rest
One-way ANOVA: Compares variances in your data
, - Total variance: Sum of squares of the
total
- Variance explained by treatment:
model Sum of squares (between
groups)
- Unexplained variance: Residual sum
of squares (within groups)
You want: big SSm and low SSr
F-ratio: MSm/MSr
MS = SS/df
Df: between groups: Ngroup-1
Within group: Npeople-Ngroup
Total df: between df+ within group df
Assumptions of ANOVA:
- Groups are more or less equal in size and have similar variances (homogeneity of
variance)
- Parametric test, dependent has normal distribution (also within groups!)
What if assumptions are not met:
o Log-transformation
o Non-parametric test: Kruskal Wallis
Contrast and Post-Hoc tests
Contrast: when you have a specific hypothesis (each contrast compares two chunks of
variances)
compare one exposure group with the other, having the placebo group as a reference group
- Simple (first): each category is compared to the first category
- Simpe (last): each category is compared to the last category
- Repeated: each category (except the first) is compared to the previous category
Post-Hoc: when you have no specific hypothesis (LSD, Tukey, Bonferroni and dunnet)
- Pairwise comparisons that are designed to compare all different combinations of the
treatment groups
- Adjust for multiple comparisons
o LSD: ~similar to t-test for comparing each pair of treatments (multiple t-tests at
the same time)
o Tukey: p-value=0.05 holds for every pair of differences
o Bonferroni: p-value is multiplied by the number of comparisons
o Dunnett: to be used when comparing simultaneously a number of treatments
with a control
Dunnett is only usable for comparing treatments with only 1 placebo group (which is
this case)
Intervention study including baseline measurements
Only two groups: unexposed and exposed
Two measurements: at the beginning and at the end
Analysis: ANCOVA
- One continuous outcome (=dependent variable)
- One discrete exposure variable (= independent variable)
- A covariate (continuous, independent variable)
- Hypothesis:
o H0 : μ1 = μ2 = μ3 (population means are equal while controlling for the effect of
one (or more) other variables)
o Ha : at least one of the population means differs from the rest
- Total variance: SSt
, - Variance explained by the
treatment: SSm (between groups)
- Unexplained variance:
o SSr (within groups)
o Explained by the covariate
You want the variance by the
covariate out of the unexplained
variance to recalculate the F-
ratio to do the ANCOVA
Therefor the unexplained
variance becomes smaller : test
= more powerful
2x2 factorial design
4 groups, with 2 exposures (group1: exposure 1, group 2: exposure 2, group 3: both
exposures, group 4: unexposed)
Compare two exposures at the same time with a placebo group
Why do you use it:
- Study interaction
o In epidemiology : Effect modification
o They show how the effect of one independent variable (exposure) might depend
on the effect of another
- Efficiency (especially when there is no interaction between the two different exposures)
Analysis:
Two-way ANOVA
- One continuous outcome (=dependent variable)
- Two discrete exposure variables (=independent variables)
It is necessary that you have different participants in all the four groups
- Total variance (SSt)
- Unexplained variance (SSr, within groups)
- Explained by treatment variance (SSm,
between groups)
o Variance explained by Treatment A
(SSa)
o Variance explained by treatment B
(SSb)
o Variance explained by the interaction
of A and B (SSa*b)
- When there is no interaction, you can add up
the groups
Repeated measures design
Why do we use it?
- Interested in the change over time compared between treatment groups
Analysis:
Repeated measures ANOVA
- Continuous outcome measured more than once over time on the same subject
- One discrete exposure variable
- Two types of variation:
o Between-subject variation: treatment (exposure)
o Within-subject variation: more measurements on same subject in time (take
correlation into account)
- Equal variance assumption: in this test -> sphericity assumption (mauchly’s test of
sphericity P<0.05 -> variances are equal, more or less, when not adjust results: take
greenhouse-geisser adjustment)
Summerize:
ANOVA can be used for:
Samenvatting
Intro lecture
After succesful completion of this course, you are expected to be able to:
- Recognize the main types of experimental and observational study design
- Choose the appropriate method of data analysis given the study design and type of
variables
- Prepare a protocol for data analysis
- Perform basic data analysis and interpret the results in a context of human intervention
trials and observational studies
- Quickly learn new data-analysis skills, which can be applied during thesis and research
- Understand the principles of calculation of sample size and study power and are able to
conduct these calculations for basic study designs
- Understand how stratification and regression analysis can be used to adjust for
confounding
- Understand the principles and procedures of energy-adjustment and is able to adjust
for energy using different methods.
Course is divided in ten topics:
1. SPSS
2. Practical modules
3. ANOVA
4. Analysis plan
5. Log-transformation and non-parametric tests
6. Logistic regression
7. Literature discussion
8. Sample size
9. Confounding
10.Energy adjustment
Lecture ANOVA
Intervention study designs:
- Parallel intervention study with more than two treatment arms
- Intervention study including baseline measurements
- 2x2 factorial design
- Repeated measures design
Parallel intervention study with more than two treatment arms
- One unexposed group, two exposed groups
Use:
- When you are interested in two different treatments for the same endpoint compared to
a placebo
Analyse:
- One-way ANOVA
o One continuous outcome (= dependent variable)
o One discrete exposure variable (= independent variable)
- H0 : μ1 = μ2 = μ3 (population means are equal)
Ha : at least one of the population means differs from the rest
One-way ANOVA: Compares variances in your data
, - Total variance: Sum of squares of the
total
- Variance explained by treatment:
model Sum of squares (between
groups)
- Unexplained variance: Residual sum
of squares (within groups)
You want: big SSm and low SSr
F-ratio: MSm/MSr
MS = SS/df
Df: between groups: Ngroup-1
Within group: Npeople-Ngroup
Total df: between df+ within group df
Assumptions of ANOVA:
- Groups are more or less equal in size and have similar variances (homogeneity of
variance)
- Parametric test, dependent has normal distribution (also within groups!)
What if assumptions are not met:
o Log-transformation
o Non-parametric test: Kruskal Wallis
Contrast and Post-Hoc tests
Contrast: when you have a specific hypothesis (each contrast compares two chunks of
variances)
compare one exposure group with the other, having the placebo group as a reference group
- Simple (first): each category is compared to the first category
- Simpe (last): each category is compared to the last category
- Repeated: each category (except the first) is compared to the previous category
Post-Hoc: when you have no specific hypothesis (LSD, Tukey, Bonferroni and dunnet)
- Pairwise comparisons that are designed to compare all different combinations of the
treatment groups
- Adjust for multiple comparisons
o LSD: ~similar to t-test for comparing each pair of treatments (multiple t-tests at
the same time)
o Tukey: p-value=0.05 holds for every pair of differences
o Bonferroni: p-value is multiplied by the number of comparisons
o Dunnett: to be used when comparing simultaneously a number of treatments
with a control
Dunnett is only usable for comparing treatments with only 1 placebo group (which is
this case)
Intervention study including baseline measurements
Only two groups: unexposed and exposed
Two measurements: at the beginning and at the end
Analysis: ANCOVA
- One continuous outcome (=dependent variable)
- One discrete exposure variable (= independent variable)
- A covariate (continuous, independent variable)
- Hypothesis:
o H0 : μ1 = μ2 = μ3 (population means are equal while controlling for the effect of
one (or more) other variables)
o Ha : at least one of the population means differs from the rest
- Total variance: SSt
, - Variance explained by the
treatment: SSm (between groups)
- Unexplained variance:
o SSr (within groups)
o Explained by the covariate
You want the variance by the
covariate out of the unexplained
variance to recalculate the F-
ratio to do the ANCOVA
Therefor the unexplained
variance becomes smaller : test
= more powerful
2x2 factorial design
4 groups, with 2 exposures (group1: exposure 1, group 2: exposure 2, group 3: both
exposures, group 4: unexposed)
Compare two exposures at the same time with a placebo group
Why do you use it:
- Study interaction
o In epidemiology : Effect modification
o They show how the effect of one independent variable (exposure) might depend
on the effect of another
- Efficiency (especially when there is no interaction between the two different exposures)
Analysis:
Two-way ANOVA
- One continuous outcome (=dependent variable)
- Two discrete exposure variables (=independent variables)
It is necessary that you have different participants in all the four groups
- Total variance (SSt)
- Unexplained variance (SSr, within groups)
- Explained by treatment variance (SSm,
between groups)
o Variance explained by Treatment A
(SSa)
o Variance explained by treatment B
(SSb)
o Variance explained by the interaction
of A and B (SSa*b)
- When there is no interaction, you can add up
the groups
Repeated measures design
Why do we use it?
- Interested in the change over time compared between treatment groups
Analysis:
Repeated measures ANOVA
- Continuous outcome measured more than once over time on the same subject
- One discrete exposure variable
- Two types of variation:
o Between-subject variation: treatment (exposure)
o Within-subject variation: more measurements on same subject in time (take
correlation into account)
- Equal variance assumption: in this test -> sphericity assumption (mauchly’s test of
sphericity P<0.05 -> variances are equal, more or less, when not adjust results: take
greenhouse-geisser adjustment)
Summerize:
ANOVA can be used for: