Introduction
If manipulation is impossible → observing the relations between variables
BUT 3 complications …
- Uncertainty: human behavior is influenced by many factors and a lot are unknown
o Solution: better theories, more knowledge, improved control
- (Measurement) noise: measurement instruments are far from perfect
o Solution: better measurement
- Variation: effects and relations in the behavioral sciences vary → variation over
situations, times and a huge variation across people
o Solution: better understanding of variation, knowing how and why effects vary
BUT humans have fallacies (drogredenen) when reasoning (e.g. confirmation bias) with a lot of
noise and variation
Basic statistical models: analysis of variance (ANOVA), linear regression and logistic regression
→ more advanced models: multilevel models and structural equation models
XY
- X: predictor, independent variable (usually multiple)
- Y: outcome, criterion, dependent variable (explained by one or more X’s)
read.table(“./Data/Chapter1/Ch1DataExample.csv”, Reading a csv file
header=TRUE, sep=”;”)
install.packages(“A”) Installing package A
library(“A”) Loading package A
cat(version$version.string) Checken welke versie R je hebt
1
, H1: The good old one-way ANOVA
ANOVA (ANalysis Of VAriance) = statistical methodology to compare the means of 2 or more
groups (~ generalization of the independent groups t-test)
Data passed the interocular trauma test if you know what the data means, when the conclusion
hits you between the eyes, no further statistical analysis is needed
𝒚𝒊𝒋 = score of person 𝑖 in condition 𝑗 ➔ 𝑖 and 𝑗 are running indices
- 𝑖 = 1, …, 𝑚𝑗 (𝑚𝑗 persons in condition 𝑗)
- 𝑗 = 1, …, 𝑎 (𝑎 conditions/groups)
o 𝑎 = levels of a factor
Balanced design = if all 𝑚𝑗 ’s are equal unbalanced design = if not all 𝑚𝑗 ’s are equal
𝑛 = total number of participants
𝑦̅𝑗 = sample average in condition 𝑗
𝑦̅ = grand sample average
Step-by-step
1. Models and hypotheses
2. Choice of the test statistic
3. The sampling distribution of F under H0 and what to conclude
4. Determine the size of your effect
STEP 1: MODELS AND HYPOTHESES
ANOVA: comparison of 2 statistical models →
= generative models: specify how the scores on the criterion variable are generated
- The full model: 𝒚𝒊𝒋 = 𝝁𝒋 + 𝜺𝒊𝒋
o Observation = systematic (structural or signal) part + random deviation
(stochastic 𝜀𝑖𝑗 or noise)
o Population mean has index 𝑗 → can differ across conditions
- The reduced model: 𝒚𝒊𝒋 = 𝝁 + 𝜺𝒊𝒋
o Special case of the full model, nested in the full model
o Assumption that 𝑎 means are all equal
o 𝐻0 = 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑎
Parameter = has certain value in the population, but is unknown to us (e.g. population mean in
full and reduced model) → draw a sample, make observations and estimate it
- Estimated parameter is indicated with a hat (e.g. 𝜇̂ )
- Fitted value = based on the estimated parameters, this is the best guess for an
observation based on the model ➔ model-based approximation to the observed score
2
,Least squares estimation = look for the values of the parameters that minimizes the sum of
squared differences between what is observed and what the model tells it should be → standard
method of estimation in ANOVA
𝐐𝐫𝐞𝐝𝐮𝐜𝐞𝐝(𝛍) = sum of squared differences → function of the unknown parameter μ → find
value of μ which minimizes Qreduced(μ)
- 𝑦𝑖𝑗 − 𝜇: difference between an observation and what the model tells us = residual 𝒆𝒊𝒋
o Large residual: model does a bad job in explaining that observation
o Small residual: model does a good job
- Squared so that positive and negative residuals don’t cancel each other out
- Reduced model: 𝝁 ̂=𝒚 ̂𝒓𝒆𝒅 ̅
𝒊𝒋 = 𝒚
o Estimated parameter = fitted values = grand sample average
𝐐𝐟𝐮𝐥𝐥 (𝝁𝟏 , … , 𝝁𝒂 )
𝒇𝒖𝒍𝒍
- Full model: 𝝁
̂𝒋 = 𝒚
̂𝒊𝒋 ̅𝒋
=𝒚
o Estimation population mean of condition 𝑗 = sample average of condition 𝑗
Sum of squares = error sum of squares = residual sum of squares = summary measure of the
size of the residuals
! Reduced model is NOT always a model with a single mean for all groups !
SSTot (total sum of squares) = measures total variation in the data
- One-way ANOVA: SSTot = SSEreduced
- SSEreduced ≥ SSEfull
- SSEff = effect sum of squares = difference between the error sum of squares
o Expresses how much we can decrease the error by taking into account the
different groups/conditions
o Shorter way of computing SSEff in one-way design:
Problems with interpreting the magnitude of SSE and SSEff
- Problem of scaling: SS can’t be interpreted meaningfully in an absolute way, but only
relative to each other (e.g. multiply scores with 100 → SS increase with factor 100²)
- SSE of reduced model is always larger of equal large than the full model (because full
model is more complex and flexible so smaller residuals)
o H0 is true (reduced model most true) → small difference between SS BUT what is
small? → solution: taking in account degrees of freedom (~ complexity models)
3
,Degrees of freedom ~ complexity of model
- There are only n – 1 independent numbers, not n, because if you know n – 1 residuals of
the reduced model, then you also know the last one (because sum is 0)
- Summing the degrees of freedom per condition: n – a (because a condition specific
population means) (cf. full model)
- Larger degrees of freedom: simpler models (with smaller number of parameters):
- df = number of observations – number of freely estimated parameters
Mean squares
- E
- Degrees of freedom of SSEff = a – 1 because it’s the difference between degrees of
freedom of reduced and full model → (n – 1) – (n – a) = a – 1
Effect parameter aj
- 𝛼𝑗 = effect or deviation of condition j compared to the grand mean 𝜇
→ hoeveel verschilt gemiddelde van groep j van algemeen gemiddelde
- Sum of effect parameters = 0
- E.g. height male/female: population mean 170, female 160, male 180 cm
→ 𝛼𝑚𝑎𝑙𝑒 = 10 cm 𝛼𝑓𝑒𝑚𝑎𝑙𝑒 = - 10 cm → sum is 0
STEP 2: CHOICE OF THE TEST STATISTIC
Find out if we can collect evidence against the reduced model (and H0) in favor of the full model
- Looking at lack of fit of the model to the data
- Evaluate complexity of the model
➔ Fit & complexity are opposing quantities: if one goes up, the other goes down
➔ Is the decrease in SSE (full < reduced) of the full model large enough to justify its
increase in complexity (full > reduced)?
F-statistic
4
, - Perspective 1: F is a fraction consisting of a numerator (top) and denominator (bottom)
o Numerator: variability between the sample averages of the conditions/groups
▪ Sampling variability: randomness
▪ Systematic variability: effect of manipulation
o Denominator: variability within conditions
▪ Only sampling variability: randomness
- Perspective 2: F is clarified by taking the expected values of MSEff and MSEfull
o m = group sample size → if larger m, larger F value
o Breuk = effect size
o Under reduced model (H0): a = 0 → E(MSEff) = σ²
STEP 3: THE SAMPLING DISTRIBUTION OF F UNDER H0 AND WHAT TO CONCLUDE
p value = the probability, given H0, to find an equally or more extreme value
of the F-statistic ➔ pr(F ≥ Fobs | H0 is true)
- p value = significance probability = observed significance level = probability level
- Conditional probability: if H0 (reduced model)
Interpretation p value
- Fisher: interpreted in a continuous way as evidence against H0
o Smaller p-value: more evidence against H0 (no effect of conditions)
- Neyman & Pearson: binary decision of rejecting H0 or not
o Comparing p value with nominal significance level 𝛼 (= 0.05, 0.01, 0.001, …)
▪ p < 𝛼: reject H0 → significant result
▪ p ≥ 𝛼: don’t reject H0
o Can also use F values directly
▪ : reject H0 (otherwise don’t reject)
▪ : 100*(1 – a) percentile of F distribution with 𝛼 -1 and n- 𝛼
degrees of freedom
STEP 4: DETERMINE THE SIZE OF YOUR EFFECT
Important to judge whether a result is (besides statistically significant) also practically or
clinically significant → effect size
- Association measure: proportion of variance explained → how strongly is the variation in
outcome associated with variation in the conditions in the population?
- 𝜼² = population proportion of variance explained
o Estimated using sample statistics: 𝜂 2 , 𝑅 2 or 𝜔
̂²
SSTot: total sum of squares
5
, - Measures the deviation from the observations to the grand sample average → index of
total variability in the sample
- Variance to be explained via the ANOVA model
- In one-way ANOVA:
SSEfull: error sum of squares
- How much variability is left unexplained under the full model (with conditions)
- Variability within conditions or groups
SSEff: effect sum of squares
- Difference between the variability to be explained and the unexplained variability
- Explained variability
Disadvantage 𝜂̂ 2 : biased estimator of the true proportion of the true variance explained 𝜂²
- 𝜂² = 0: true effect is 0, factor is not associated with outcome
o Then E(𝜂̂ 2 ) ≥ 0: positive bias
- E(𝜂̂ 2 ) = 0 if …
o Positive and negative values of 𝜂̂ 2 cancel each other out → CAN’T happen
because 𝜂̂ 2 > 0
o 𝜂̂ 2 = 0 in each sample → very unlikely because condition sample averages will
never be exactly equal to each other due to sampling variability → small positive
values of 𝜂̂ 2
Unbiased proportion of variance explained: 𝝎
̂²
- Unbiased estimator of the proportion of variance explained in the population
- If 𝜂² = 0 ➔ then E(𝜔̂²) = 0
- 𝜔̂² is usually smaller & better than 𝜂²
- Downside: 𝜔 ̂² can become negative (needed to attain a 0 average value when no effect)
o Usually set 𝜔 ̂² = 0 if it is smaller than 0 !
Rule of thumb (but also depends on domain of research)
- Proportion variance 1%: small effect
- 6%: medium effect
- 14%: large effect
We don’t use the F statistics or the p value as a measure of effect size because they depend on
the size of the effect and the sample size
➔ you can have a very small effect, but a very large F value (or small p value) because of a huge
sample
6
,Uncertainty of effect size estimates (𝜂² and 𝜔
̂²) → confidence intervals
→ only for quantities of primary interest ((differences between) means, effect sizes, …)
Practical session 1
pnorm(x) Kans (probability) dat normaal verdeelde variabele kleiner dan of gelijk
aan x
pt(t, df) Kans (probability) dat t-statistiek kleiner dan of gelijk aan t, met df
vrijheidsgraden
➔ computes the cumulative probability (area under the curve) for a t-
distribution given a t-score and degrees of freedom
qt(p, df) Quantile, kwantiel van p bij df vrijheidsgraden → welke waarde?
pf(F, df1, df2) Berekent p-value, kans op kleinere of gelijke F-waarde met F-verdeling
met df1 en df2 vrijheidsgraden ➔ computes the cumulative probability
for an F-distribution given a f-score and degrees of freedom
qf(p, df1, df2) Berekent F-value bij F verdeling met df1 en df2 vrijheidsgraden
Pr(H0 is true) ➔ prior probabilities (Bayesian): subjective (different for different people) and
unknown
Pr(H0 is true|Fobs) ➔ posterior probability (Bayesian): obv data
Larger n → smaller critical value
Smaller α → larger critical value
Larger a → larger critical value
See “test yourself” !! (8?)
7
, H2: Contrasts, or how to be more specific
2.1 DATA EXAMPLE: THE TREATMENT OF DEPRESSION REVISITED
Preregistration = the research questions, hypotheses, design and plan of analysis are specified
before the data have been collected → open science
- Written in a time-stamped and publicly accessible document
- Researcher can’t change his hypotheses from exploratory or post-hoc to confirmatory or
planned
2.2 GOAL OF THIS CHAPTER
F-test checks if there are differences between the conditions
- But could be that condition 1 differs from 2, but not from 3 or that they all differ or …
Analysis of contrasts checks which conditions differ from each other and how much they differ
2.3 SOME TERMINOLOGY
Contrast / comparison = difference in the averages of 2 or more conditions (e.g. placebo vs
treatment)
Pairwise contrast = simple difference between the averages of 2 conditions (e.g. 𝑦̅1 − 𝑦̅2 )
Complex contrast = more complicated difference between 2 elements, and one or both of
these elements are averages of several conditions (e.g. between the placebo condition and the
1
average of the 2 (or more) treatment groups 𝑦̅1 − 2 (𝑦̅2 + 𝑦̅3 ) )
Contrast = linear combination of sample averages
- Coefficients cj sum to 0 (are known)
Population contrast:
- Population value of the contrast 𝛾:
- Sample estimate g:
Planned contrast = specified before the data have been collected or seen
Post-hoc contrast = inspired by looking at the data (e.g. difference between group 1 and 3 looks
large, let’s test it)
Multiple contrasts, multiple comparisons = more than one planned contrast
Multiple post-hoc contrasts
8
,2.4 A SINGLE PLANNED CONTRAST
2.4.1 DERIVATION OF THE SAMPLING DISTRIBUTION OF G
Sampling distribution of g under a statistical model (e.g. full model) quantifies the uncertainty
around the sample contrast value g
1. FORM OF THE DISTRIBUTION OF G
If 𝑦𝑖𝑗 is normally distributed → sample averages of these observations + every linear
combination of the sample averages (= contrasts) are also normally distributed
Normal distribution is completely determined by its mean and variance
2. EXPECTED VALUE OF G
Expected value of linear combination = linear combination of expected values
g = unbiased estimator of 𝛾
3. VARIANCE OF G
Variance of the sum = sum of variances (because iid)
Variance of the sample average = variance of single observation (𝜎 2 ) divided by number of
observations of sample average (mj)
4. SUMMARY
Standard error of g = uncertainty in g based on sample measures
Replace unknown 𝜎 2 by estimate based on data: MSEfull
9
, 2.4.2 STATISTICAL INTERFERENCE FOR A SINGLE PLANNED CONTRAST
Statistical interference for 𝛾 → confidence interval or hypothesis testing
1. CONFIDENCE INTERVAL (CI)
100*(1 – 𝜶)% confidence interval for a single planned contrast:
95% CI (𝛼 = 0.05) → 97.5% quantile 99% CI (𝛼 = 0.01) → 99.5% quantile
Half-width of the confidence interval
2. HYPOTHESIS TEST
with C is a hypothesized value → usually C = 0
Difference between what we observe (g) and what is hypothesized (C) divided by the uncertainty
in g (SE(g)) ➔ if H0 true, then
Square of t-statistic = F-statistic (when comparing full and reduced model in F-test)
Effect size
- If well defined measurement scale (e.g. meter, euro, °C) → contrast value g (with CI)
- Standardized effect size measure (without measurement units) → cohen’s d:
o Difference of 2 means divided by the estimate of the within-group
standard deviation (common to both groups)
o Cohen’s d = estimate of the population value 𝜹 (delta)
o For pairwise contrasts (or complex contrasts: using its numerator sample value)
- Interpretation cohen’s d
o Around 0.2 = small effect
o Around 0.5 = medium effect
o Around 0.8 = large effect
Streetwise statistics: if sample sizes are large enough (e.g. df-full > 30) → t-distribution looks like
standard normal distribution → we can use standard normal distribution for CI and testing
If large sample sizes, an 95% CI can be calculated by using 2 as a multiplier for SE(g) instead of
𝑧 0.975 = 1.96 → rough hypothesis test: comparing absolute value of t-statistic with 2 to evaluate
the significance
2.5 MULTIPLE TESTING : MANY PLANNED CONTRASTS
Illustration: if test have 5% false alarms and doctor 1 tests for disease A and doctor 2 tests for
disease A, B, C, D, E → if nobody has any disease, then doctor 1 has 50 false alarms and doctor
2 has 226 false alarms
10
If manipulation is impossible → observing the relations between variables
BUT 3 complications …
- Uncertainty: human behavior is influenced by many factors and a lot are unknown
o Solution: better theories, more knowledge, improved control
- (Measurement) noise: measurement instruments are far from perfect
o Solution: better measurement
- Variation: effects and relations in the behavioral sciences vary → variation over
situations, times and a huge variation across people
o Solution: better understanding of variation, knowing how and why effects vary
BUT humans have fallacies (drogredenen) when reasoning (e.g. confirmation bias) with a lot of
noise and variation
Basic statistical models: analysis of variance (ANOVA), linear regression and logistic regression
→ more advanced models: multilevel models and structural equation models
XY
- X: predictor, independent variable (usually multiple)
- Y: outcome, criterion, dependent variable (explained by one or more X’s)
read.table(“./Data/Chapter1/Ch1DataExample.csv”, Reading a csv file
header=TRUE, sep=”;”)
install.packages(“A”) Installing package A
library(“A”) Loading package A
cat(version$version.string) Checken welke versie R je hebt
1
, H1: The good old one-way ANOVA
ANOVA (ANalysis Of VAriance) = statistical methodology to compare the means of 2 or more
groups (~ generalization of the independent groups t-test)
Data passed the interocular trauma test if you know what the data means, when the conclusion
hits you between the eyes, no further statistical analysis is needed
𝒚𝒊𝒋 = score of person 𝑖 in condition 𝑗 ➔ 𝑖 and 𝑗 are running indices
- 𝑖 = 1, …, 𝑚𝑗 (𝑚𝑗 persons in condition 𝑗)
- 𝑗 = 1, …, 𝑎 (𝑎 conditions/groups)
o 𝑎 = levels of a factor
Balanced design = if all 𝑚𝑗 ’s are equal unbalanced design = if not all 𝑚𝑗 ’s are equal
𝑛 = total number of participants
𝑦̅𝑗 = sample average in condition 𝑗
𝑦̅ = grand sample average
Step-by-step
1. Models and hypotheses
2. Choice of the test statistic
3. The sampling distribution of F under H0 and what to conclude
4. Determine the size of your effect
STEP 1: MODELS AND HYPOTHESES
ANOVA: comparison of 2 statistical models →
= generative models: specify how the scores on the criterion variable are generated
- The full model: 𝒚𝒊𝒋 = 𝝁𝒋 + 𝜺𝒊𝒋
o Observation = systematic (structural or signal) part + random deviation
(stochastic 𝜀𝑖𝑗 or noise)
o Population mean has index 𝑗 → can differ across conditions
- The reduced model: 𝒚𝒊𝒋 = 𝝁 + 𝜺𝒊𝒋
o Special case of the full model, nested in the full model
o Assumption that 𝑎 means are all equal
o 𝐻0 = 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑎
Parameter = has certain value in the population, but is unknown to us (e.g. population mean in
full and reduced model) → draw a sample, make observations and estimate it
- Estimated parameter is indicated with a hat (e.g. 𝜇̂ )
- Fitted value = based on the estimated parameters, this is the best guess for an
observation based on the model ➔ model-based approximation to the observed score
2
,Least squares estimation = look for the values of the parameters that minimizes the sum of
squared differences between what is observed and what the model tells it should be → standard
method of estimation in ANOVA
𝐐𝐫𝐞𝐝𝐮𝐜𝐞𝐝(𝛍) = sum of squared differences → function of the unknown parameter μ → find
value of μ which minimizes Qreduced(μ)
- 𝑦𝑖𝑗 − 𝜇: difference between an observation and what the model tells us = residual 𝒆𝒊𝒋
o Large residual: model does a bad job in explaining that observation
o Small residual: model does a good job
- Squared so that positive and negative residuals don’t cancel each other out
- Reduced model: 𝝁 ̂=𝒚 ̂𝒓𝒆𝒅 ̅
𝒊𝒋 = 𝒚
o Estimated parameter = fitted values = grand sample average
𝐐𝐟𝐮𝐥𝐥 (𝝁𝟏 , … , 𝝁𝒂 )
𝒇𝒖𝒍𝒍
- Full model: 𝝁
̂𝒋 = 𝒚
̂𝒊𝒋 ̅𝒋
=𝒚
o Estimation population mean of condition 𝑗 = sample average of condition 𝑗
Sum of squares = error sum of squares = residual sum of squares = summary measure of the
size of the residuals
! Reduced model is NOT always a model with a single mean for all groups !
SSTot (total sum of squares) = measures total variation in the data
- One-way ANOVA: SSTot = SSEreduced
- SSEreduced ≥ SSEfull
- SSEff = effect sum of squares = difference between the error sum of squares
o Expresses how much we can decrease the error by taking into account the
different groups/conditions
o Shorter way of computing SSEff in one-way design:
Problems with interpreting the magnitude of SSE and SSEff
- Problem of scaling: SS can’t be interpreted meaningfully in an absolute way, but only
relative to each other (e.g. multiply scores with 100 → SS increase with factor 100²)
- SSE of reduced model is always larger of equal large than the full model (because full
model is more complex and flexible so smaller residuals)
o H0 is true (reduced model most true) → small difference between SS BUT what is
small? → solution: taking in account degrees of freedom (~ complexity models)
3
,Degrees of freedom ~ complexity of model
- There are only n – 1 independent numbers, not n, because if you know n – 1 residuals of
the reduced model, then you also know the last one (because sum is 0)
- Summing the degrees of freedom per condition: n – a (because a condition specific
population means) (cf. full model)
- Larger degrees of freedom: simpler models (with smaller number of parameters):
- df = number of observations – number of freely estimated parameters
Mean squares
- E
- Degrees of freedom of SSEff = a – 1 because it’s the difference between degrees of
freedom of reduced and full model → (n – 1) – (n – a) = a – 1
Effect parameter aj
- 𝛼𝑗 = effect or deviation of condition j compared to the grand mean 𝜇
→ hoeveel verschilt gemiddelde van groep j van algemeen gemiddelde
- Sum of effect parameters = 0
- E.g. height male/female: population mean 170, female 160, male 180 cm
→ 𝛼𝑚𝑎𝑙𝑒 = 10 cm 𝛼𝑓𝑒𝑚𝑎𝑙𝑒 = - 10 cm → sum is 0
STEP 2: CHOICE OF THE TEST STATISTIC
Find out if we can collect evidence against the reduced model (and H0) in favor of the full model
- Looking at lack of fit of the model to the data
- Evaluate complexity of the model
➔ Fit & complexity are opposing quantities: if one goes up, the other goes down
➔ Is the decrease in SSE (full < reduced) of the full model large enough to justify its
increase in complexity (full > reduced)?
F-statistic
4
, - Perspective 1: F is a fraction consisting of a numerator (top) and denominator (bottom)
o Numerator: variability between the sample averages of the conditions/groups
▪ Sampling variability: randomness
▪ Systematic variability: effect of manipulation
o Denominator: variability within conditions
▪ Only sampling variability: randomness
- Perspective 2: F is clarified by taking the expected values of MSEff and MSEfull
o m = group sample size → if larger m, larger F value
o Breuk = effect size
o Under reduced model (H0): a = 0 → E(MSEff) = σ²
STEP 3: THE SAMPLING DISTRIBUTION OF F UNDER H0 AND WHAT TO CONCLUDE
p value = the probability, given H0, to find an equally or more extreme value
of the F-statistic ➔ pr(F ≥ Fobs | H0 is true)
- p value = significance probability = observed significance level = probability level
- Conditional probability: if H0 (reduced model)
Interpretation p value
- Fisher: interpreted in a continuous way as evidence against H0
o Smaller p-value: more evidence against H0 (no effect of conditions)
- Neyman & Pearson: binary decision of rejecting H0 or not
o Comparing p value with nominal significance level 𝛼 (= 0.05, 0.01, 0.001, …)
▪ p < 𝛼: reject H0 → significant result
▪ p ≥ 𝛼: don’t reject H0
o Can also use F values directly
▪ : reject H0 (otherwise don’t reject)
▪ : 100*(1 – a) percentile of F distribution with 𝛼 -1 and n- 𝛼
degrees of freedom
STEP 4: DETERMINE THE SIZE OF YOUR EFFECT
Important to judge whether a result is (besides statistically significant) also practically or
clinically significant → effect size
- Association measure: proportion of variance explained → how strongly is the variation in
outcome associated with variation in the conditions in the population?
- 𝜼² = population proportion of variance explained
o Estimated using sample statistics: 𝜂 2 , 𝑅 2 or 𝜔
̂²
SSTot: total sum of squares
5
, - Measures the deviation from the observations to the grand sample average → index of
total variability in the sample
- Variance to be explained via the ANOVA model
- In one-way ANOVA:
SSEfull: error sum of squares
- How much variability is left unexplained under the full model (with conditions)
- Variability within conditions or groups
SSEff: effect sum of squares
- Difference between the variability to be explained and the unexplained variability
- Explained variability
Disadvantage 𝜂̂ 2 : biased estimator of the true proportion of the true variance explained 𝜂²
- 𝜂² = 0: true effect is 0, factor is not associated with outcome
o Then E(𝜂̂ 2 ) ≥ 0: positive bias
- E(𝜂̂ 2 ) = 0 if …
o Positive and negative values of 𝜂̂ 2 cancel each other out → CAN’T happen
because 𝜂̂ 2 > 0
o 𝜂̂ 2 = 0 in each sample → very unlikely because condition sample averages will
never be exactly equal to each other due to sampling variability → small positive
values of 𝜂̂ 2
Unbiased proportion of variance explained: 𝝎
̂²
- Unbiased estimator of the proportion of variance explained in the population
- If 𝜂² = 0 ➔ then E(𝜔̂²) = 0
- 𝜔̂² is usually smaller & better than 𝜂²
- Downside: 𝜔 ̂² can become negative (needed to attain a 0 average value when no effect)
o Usually set 𝜔 ̂² = 0 if it is smaller than 0 !
Rule of thumb (but also depends on domain of research)
- Proportion variance 1%: small effect
- 6%: medium effect
- 14%: large effect
We don’t use the F statistics or the p value as a measure of effect size because they depend on
the size of the effect and the sample size
➔ you can have a very small effect, but a very large F value (or small p value) because of a huge
sample
6
,Uncertainty of effect size estimates (𝜂² and 𝜔
̂²) → confidence intervals
→ only for quantities of primary interest ((differences between) means, effect sizes, …)
Practical session 1
pnorm(x) Kans (probability) dat normaal verdeelde variabele kleiner dan of gelijk
aan x
pt(t, df) Kans (probability) dat t-statistiek kleiner dan of gelijk aan t, met df
vrijheidsgraden
➔ computes the cumulative probability (area under the curve) for a t-
distribution given a t-score and degrees of freedom
qt(p, df) Quantile, kwantiel van p bij df vrijheidsgraden → welke waarde?
pf(F, df1, df2) Berekent p-value, kans op kleinere of gelijke F-waarde met F-verdeling
met df1 en df2 vrijheidsgraden ➔ computes the cumulative probability
for an F-distribution given a f-score and degrees of freedom
qf(p, df1, df2) Berekent F-value bij F verdeling met df1 en df2 vrijheidsgraden
Pr(H0 is true) ➔ prior probabilities (Bayesian): subjective (different for different people) and
unknown
Pr(H0 is true|Fobs) ➔ posterior probability (Bayesian): obv data
Larger n → smaller critical value
Smaller α → larger critical value
Larger a → larger critical value
See “test yourself” !! (8?)
7
, H2: Contrasts, or how to be more specific
2.1 DATA EXAMPLE: THE TREATMENT OF DEPRESSION REVISITED
Preregistration = the research questions, hypotheses, design and plan of analysis are specified
before the data have been collected → open science
- Written in a time-stamped and publicly accessible document
- Researcher can’t change his hypotheses from exploratory or post-hoc to confirmatory or
planned
2.2 GOAL OF THIS CHAPTER
F-test checks if there are differences between the conditions
- But could be that condition 1 differs from 2, but not from 3 or that they all differ or …
Analysis of contrasts checks which conditions differ from each other and how much they differ
2.3 SOME TERMINOLOGY
Contrast / comparison = difference in the averages of 2 or more conditions (e.g. placebo vs
treatment)
Pairwise contrast = simple difference between the averages of 2 conditions (e.g. 𝑦̅1 − 𝑦̅2 )
Complex contrast = more complicated difference between 2 elements, and one or both of
these elements are averages of several conditions (e.g. between the placebo condition and the
1
average of the 2 (or more) treatment groups 𝑦̅1 − 2 (𝑦̅2 + 𝑦̅3 ) )
Contrast = linear combination of sample averages
- Coefficients cj sum to 0 (are known)
Population contrast:
- Population value of the contrast 𝛾:
- Sample estimate g:
Planned contrast = specified before the data have been collected or seen
Post-hoc contrast = inspired by looking at the data (e.g. difference between group 1 and 3 looks
large, let’s test it)
Multiple contrasts, multiple comparisons = more than one planned contrast
Multiple post-hoc contrasts
8
,2.4 A SINGLE PLANNED CONTRAST
2.4.1 DERIVATION OF THE SAMPLING DISTRIBUTION OF G
Sampling distribution of g under a statistical model (e.g. full model) quantifies the uncertainty
around the sample contrast value g
1. FORM OF THE DISTRIBUTION OF G
If 𝑦𝑖𝑗 is normally distributed → sample averages of these observations + every linear
combination of the sample averages (= contrasts) are also normally distributed
Normal distribution is completely determined by its mean and variance
2. EXPECTED VALUE OF G
Expected value of linear combination = linear combination of expected values
g = unbiased estimator of 𝛾
3. VARIANCE OF G
Variance of the sum = sum of variances (because iid)
Variance of the sample average = variance of single observation (𝜎 2 ) divided by number of
observations of sample average (mj)
4. SUMMARY
Standard error of g = uncertainty in g based on sample measures
Replace unknown 𝜎 2 by estimate based on data: MSEfull
9
, 2.4.2 STATISTICAL INTERFERENCE FOR A SINGLE PLANNED CONTRAST
Statistical interference for 𝛾 → confidence interval or hypothesis testing
1. CONFIDENCE INTERVAL (CI)
100*(1 – 𝜶)% confidence interval for a single planned contrast:
95% CI (𝛼 = 0.05) → 97.5% quantile 99% CI (𝛼 = 0.01) → 99.5% quantile
Half-width of the confidence interval
2. HYPOTHESIS TEST
with C is a hypothesized value → usually C = 0
Difference between what we observe (g) and what is hypothesized (C) divided by the uncertainty
in g (SE(g)) ➔ if H0 true, then
Square of t-statistic = F-statistic (when comparing full and reduced model in F-test)
Effect size
- If well defined measurement scale (e.g. meter, euro, °C) → contrast value g (with CI)
- Standardized effect size measure (without measurement units) → cohen’s d:
o Difference of 2 means divided by the estimate of the within-group
standard deviation (common to both groups)
o Cohen’s d = estimate of the population value 𝜹 (delta)
o For pairwise contrasts (or complex contrasts: using its numerator sample value)
- Interpretation cohen’s d
o Around 0.2 = small effect
o Around 0.5 = medium effect
o Around 0.8 = large effect
Streetwise statistics: if sample sizes are large enough (e.g. df-full > 30) → t-distribution looks like
standard normal distribution → we can use standard normal distribution for CI and testing
If large sample sizes, an 95% CI can be calculated by using 2 as a multiplier for SE(g) instead of
𝑧 0.975 = 1.96 → rough hypothesis test: comparing absolute value of t-statistic with 2 to evaluate
the significance
2.5 MULTIPLE TESTING : MANY PLANNED CONTRASTS
Illustration: if test have 5% false alarms and doctor 1 tests for disease A and doctor 2 tests for
disease A, B, C, D, E → if nobody has any disease, then doctor 1 has 50 false alarms and doctor
2 has 226 false alarms
10