Univariate analysis (exam 1)
Group differences
Introduction
Ecological problems
Lot of sites/plots
Many variables
Lot of natural variation (noise)
solution: statistics
Sample size (N)
Larger N better estimate mean & s, value stabilizes
Larger N smaller CI
Don’t learn standard deviation (SD or s) formula
Standard deviation = s = σ
Variation = s2
SPSS: levels of measurements
Nominal => categories
- Habitat, sex, colour
- Prey species killed
Ordinal => order
- Abundant, frequent, rare
- Droppings
Scale => ratio scale
- Absolute 0 (weight, length, intake)
- Subtract, add, multiplication
- Fish size & body mass
Distribution type
Normal => symmetric & continuous
Lognormal => skewed & continuous
- Exponential growth (biomass)
Poisson & negative binomial => skewed, non continuous (discrete)
- Counts (quadrants)
Binomial => 2 outcomes
- Dead/alive, present/absent
,T-tests
Independent samples (df = N – 2)
1. Unequal variances
2. Equal variances
Dependent samples (df = N – 1)
3. Paired sample
Hypothesis => testable explanation of observation
Clear direction: larger/smaller, increase/decrease
Based on observations + what you already know to be true
If … then …
Independent variable (x) = cause
Dependent var (y) = effect
Example:
- Ho => no difference
- H1 => mean body weight larger in area B than area A
Standard error of the mean (SE) = standard deviation of X̄ = s/√ N
T-test unequal variances (1)
Difference between 2 means/SE of that difference
- If X̄ 1- X̄ 2 = 0 t = 0
- If X X̄ 1- X̄ 2 = large or S2/N decreases t increases
Threshold: p = 0.05!!
- p ≤ 0.05 reject Ho
- p ≥ 0.05 do not reject Ho
T-test for equal variances (2)
Difference between 2 means/SE of that difference
Levene’s test
- Ho: equal variances (p ≥ 0.05)
,Degrees of freedom (df)
Df < N
- Depends on test & data
Significant outcome if: calculated t > critical t from table
Reporting statistics
1. Used test
2. Statistical parameter value (e.g. t-value)
3. Df/N
4. P-value
One-sided vs two-sided
One-tailed test sooner significant
- But: only use if you know that 1 group has higher mean (e.g. more mortality in polluted
area)
Ecology: lot of noise & uncertainty often two-tailed
- Doubt always use two-tailed
- Sometimes one-tailed (F-test)
Type 1 & type 2 error
Type 1 => reject Ho while true
4. Conclude someone is pregnant who is not
Type 2 => do not reject Ho while false
T-test for paired data (3)
Example:
- Same animal measured 2x (at two moments in
time)
z̄ = average value of difference
Power & sample size
Power => likelihood of test reaching correct conclusion
- Smaller type II error larger power than visually inspected
Larger power implications for:
- Experimental design
- Sample size
- Test results
Minimum sample size calculation
, Non-parametric & transformation
Parametric test: assumes data distribution is characterized by mean, SD
- Only use (transformed) normally distributed data (black line)
- Higher power than non-parametric test
Not normally distributed (red line) transform data OR use non-parametric test
- Preferably transform, only use non-parametric test as last option
Test for normality
- Histogram
- Statistical test Shapiro Wilk test
Data transformation
Check:
- Variance (s2), mean (X̄ ) & histogram
Rules of thumb (no hard guidelines)
- S2 > X̄ log(variable) or ln(var) e.g. growth, biomass
- S2 X̄ √ var e.g. area, size
- Highly skewed √ √ var
- Binomial ln(p/(1-p)) e.g. presence/absence (binomial data)
0 values
- Use log(x+1) or ln(x+1)
- √ var + 0.5
Transform whole var check again!
- Still not normal: use non-parametric test (can also be used for normal data)
- Parametric test has higher power
Non-parametric tests
Mann Whitney U test
Wilcoxon matched pairs test
Kruskal Wallis test
Mann Whitney U test (1)
Not normal & unmatched pairs
Ho: 2 medians are equal
U value for 2 groups
- Compare smallest U value with table
- U < critical value reject Ho
- Tied data? (same value & rank) use asymptote p
Wilcoxon matched pairs test (2)
Group differences
Introduction
Ecological problems
Lot of sites/plots
Many variables
Lot of natural variation (noise)
solution: statistics
Sample size (N)
Larger N better estimate mean & s, value stabilizes
Larger N smaller CI
Don’t learn standard deviation (SD or s) formula
Standard deviation = s = σ
Variation = s2
SPSS: levels of measurements
Nominal => categories
- Habitat, sex, colour
- Prey species killed
Ordinal => order
- Abundant, frequent, rare
- Droppings
Scale => ratio scale
- Absolute 0 (weight, length, intake)
- Subtract, add, multiplication
- Fish size & body mass
Distribution type
Normal => symmetric & continuous
Lognormal => skewed & continuous
- Exponential growth (biomass)
Poisson & negative binomial => skewed, non continuous (discrete)
- Counts (quadrants)
Binomial => 2 outcomes
- Dead/alive, present/absent
,T-tests
Independent samples (df = N – 2)
1. Unequal variances
2. Equal variances
Dependent samples (df = N – 1)
3. Paired sample
Hypothesis => testable explanation of observation
Clear direction: larger/smaller, increase/decrease
Based on observations + what you already know to be true
If … then …
Independent variable (x) = cause
Dependent var (y) = effect
Example:
- Ho => no difference
- H1 => mean body weight larger in area B than area A
Standard error of the mean (SE) = standard deviation of X̄ = s/√ N
T-test unequal variances (1)
Difference between 2 means/SE of that difference
- If X̄ 1- X̄ 2 = 0 t = 0
- If X X̄ 1- X̄ 2 = large or S2/N decreases t increases
Threshold: p = 0.05!!
- p ≤ 0.05 reject Ho
- p ≥ 0.05 do not reject Ho
T-test for equal variances (2)
Difference between 2 means/SE of that difference
Levene’s test
- Ho: equal variances (p ≥ 0.05)
,Degrees of freedom (df)
Df < N
- Depends on test & data
Significant outcome if: calculated t > critical t from table
Reporting statistics
1. Used test
2. Statistical parameter value (e.g. t-value)
3. Df/N
4. P-value
One-sided vs two-sided
One-tailed test sooner significant
- But: only use if you know that 1 group has higher mean (e.g. more mortality in polluted
area)
Ecology: lot of noise & uncertainty often two-tailed
- Doubt always use two-tailed
- Sometimes one-tailed (F-test)
Type 1 & type 2 error
Type 1 => reject Ho while true
4. Conclude someone is pregnant who is not
Type 2 => do not reject Ho while false
T-test for paired data (3)
Example:
- Same animal measured 2x (at two moments in
time)
z̄ = average value of difference
Power & sample size
Power => likelihood of test reaching correct conclusion
- Smaller type II error larger power than visually inspected
Larger power implications for:
- Experimental design
- Sample size
- Test results
Minimum sample size calculation
, Non-parametric & transformation
Parametric test: assumes data distribution is characterized by mean, SD
- Only use (transformed) normally distributed data (black line)
- Higher power than non-parametric test
Not normally distributed (red line) transform data OR use non-parametric test
- Preferably transform, only use non-parametric test as last option
Test for normality
- Histogram
- Statistical test Shapiro Wilk test
Data transformation
Check:
- Variance (s2), mean (X̄ ) & histogram
Rules of thumb (no hard guidelines)
- S2 > X̄ log(variable) or ln(var) e.g. growth, biomass
- S2 X̄ √ var e.g. area, size
- Highly skewed √ √ var
- Binomial ln(p/(1-p)) e.g. presence/absence (binomial data)
0 values
- Use log(x+1) or ln(x+1)
- √ var + 0.5
Transform whole var check again!
- Still not normal: use non-parametric test (can also be used for normal data)
- Parametric test has higher power
Non-parametric tests
Mann Whitney U test
Wilcoxon matched pairs test
Kruskal Wallis test
Mann Whitney U test (1)
Not normal & unmatched pairs
Ho: 2 medians are equal
U value for 2 groups
- Compare smallest U value with table
- U < critical value reject Ho
- Tied data? (same value & rank) use asymptote p
Wilcoxon matched pairs test (2)