Recap Test Models
P-value => the chance of observing a difference from H0 at least as extreme as the one in you
sample
Normal distribution => continuous outcomes with a central tendency
Poisson distribution => model independent counts
Binomial distribution => binary data and ratios
Standard Error (SE) => is a measure of how much the sample mean (average) is expected to
vary from the true population mean. It helps us understand how reliable or representative
our sample mean is as an estimate of the population mean. A smaller standard error suggests
a more reliable estimate, while a larger one indicates more uncertainty.
Standard deviation (SD) => tells us how spread out or varied a set of data points is from the
average (mean). It helps us understand the degree of variability or dispersion in a dataset. A
larger standard deviation means the data points are more spread out, while a smaller one
indicates they are closer to the mean.
Multiple testing => multiple tests can lead to a false positive => correction needed
o Bonferroni => simple, big sample size, effects are large.
p.adjust(p-value data, “bonferroni”)
o FDR => powerful, limited sample size, effects are small.
p.adjust(p-value data, “fdr”)
, Short overview of what test to choose
Name What do you want to do Assumptions
One Sample T-Test Comparing one group mean with Normally distributed data
a theoretical value
One-Sided Group mean is smaller or bigger Normally distributed data
than theoretical value
Two-Sided Group mean is different than Normally distributed data
theoretical value
Two Sample T-Test Comparing two group means Normally distributed data,
homogeneity of variance,
independent data
One-Sided Group mean 1 is smaller or bigger Normally distributed data,
than group mean 2 homogeneity of variance,
independent data
Two-Sided Group mean 1 is different than Normally distributed data,
group mean 2 homogeneity of variance,
independent data
Unequal Groups differ in variance from Normally distributed data,
Variance each other heterogeneity of variance,
independent data
Equal Groups do not differ in variance Normally distributed data,
Variance from each other homogeneity of variance,
independent data
Independent Two independent groups Normally distributed data,
Samples homogeneity of variance,
independent data
Paired Two related groups Normally distributed data,
Samples homogeneity of variance,
Dependent data
Non-parametric alternative T-Test Two group means Not normally distributed data
(same subtypes as two sample t-
test)
Chi-Squared Test Comparing observed to expected It is count data
values of count data
One-way ANOVA One continuous outcome and one Linearity
categorical explanatory variable Normally distributed residuals
(factors with multiple levels) Homogeneity of variance
No outliers
Independent data
Two-Way ANOVA One continuous outcome and two Linearity
categorical explanatory variables Normally distributed residuals
(factors with multiple levels) Homogeneity of variance
No outliers
Independent data
Multi-way ANOVA One continuous outcome and Linearity
three or more categorical Normally distributed residuals
explanatory variables (factors Homogeneity of variance
with multiple levels) No outliers
Independent data
Simple Linear Regression Continuous explanatory variable Linearity
and a continuous outcome Normally distributed residuals
P-value => the chance of observing a difference from H0 at least as extreme as the one in you
sample
Normal distribution => continuous outcomes with a central tendency
Poisson distribution => model independent counts
Binomial distribution => binary data and ratios
Standard Error (SE) => is a measure of how much the sample mean (average) is expected to
vary from the true population mean. It helps us understand how reliable or representative
our sample mean is as an estimate of the population mean. A smaller standard error suggests
a more reliable estimate, while a larger one indicates more uncertainty.
Standard deviation (SD) => tells us how spread out or varied a set of data points is from the
average (mean). It helps us understand the degree of variability or dispersion in a dataset. A
larger standard deviation means the data points are more spread out, while a smaller one
indicates they are closer to the mean.
Multiple testing => multiple tests can lead to a false positive => correction needed
o Bonferroni => simple, big sample size, effects are large.
p.adjust(p-value data, “bonferroni”)
o FDR => powerful, limited sample size, effects are small.
p.adjust(p-value data, “fdr”)
, Short overview of what test to choose
Name What do you want to do Assumptions
One Sample T-Test Comparing one group mean with Normally distributed data
a theoretical value
One-Sided Group mean is smaller or bigger Normally distributed data
than theoretical value
Two-Sided Group mean is different than Normally distributed data
theoretical value
Two Sample T-Test Comparing two group means Normally distributed data,
homogeneity of variance,
independent data
One-Sided Group mean 1 is smaller or bigger Normally distributed data,
than group mean 2 homogeneity of variance,
independent data
Two-Sided Group mean 1 is different than Normally distributed data,
group mean 2 homogeneity of variance,
independent data
Unequal Groups differ in variance from Normally distributed data,
Variance each other heterogeneity of variance,
independent data
Equal Groups do not differ in variance Normally distributed data,
Variance from each other homogeneity of variance,
independent data
Independent Two independent groups Normally distributed data,
Samples homogeneity of variance,
independent data
Paired Two related groups Normally distributed data,
Samples homogeneity of variance,
Dependent data
Non-parametric alternative T-Test Two group means Not normally distributed data
(same subtypes as two sample t-
test)
Chi-Squared Test Comparing observed to expected It is count data
values of count data
One-way ANOVA One continuous outcome and one Linearity
categorical explanatory variable Normally distributed residuals
(factors with multiple levels) Homogeneity of variance
No outliers
Independent data
Two-Way ANOVA One continuous outcome and two Linearity
categorical explanatory variables Normally distributed residuals
(factors with multiple levels) Homogeneity of variance
No outliers
Independent data
Multi-way ANOVA One continuous outcome and Linearity
three or more categorical Normally distributed residuals
explanatory variables (factors Homogeneity of variance
with multiple levels) No outliers
Independent data
Simple Linear Regression Continuous explanatory variable Linearity
and a continuous outcome Normally distributed residuals