Before you start :
- Create a folder / directory on your PC where you put all files related to this practical
- Turn this folder into your working directory
setwd ( "C:/wherever your files are")
1. Demo : the independent sample t-test
Data in long format : ozoneLong.txt
Read in the dataset ozoneLong.txt. This small dataset contains measurements of ozone (O3) in
two gardens, labelled A and B.
White format: concentrations in the gardens in two separate columns: no paired data. Independent
observations.
In the long format, there is the same data but arranged differently: one column which tells you which
garden and one column which gives the value of the ozone.
Here we want to know if there is a difference in ozone concentrations between the 2 gardens. The
null hypothesis is there is no difference in ozone between the gardens.
- Parametric tests: e.g. T-test
o Only allowed if you have enough observations and the data has +- a normal
distribution
- If the conditions for parametric testing are not fulfilled, you do Mann-Withney U test
Use the formula interface: t.test(continuantVar~groupingVar,data=myData)
Comes back quite often: this is usually the first argument of some function about a statistical test.
Generally it is y~x with y being the dependent variable (in t.test= continuous variable, here ozone),
and x that defines the groups (which group an observation comes from). The third argument is the
data frame.
First explore the data :
- How are the data organized? Check the environment in RStudio, or the str() function.
What do the two variables mean?
- Generate a plot to visualize the ozone concentration in the two gardens
Plot(myData$garden,myData$ozone)
Gives automatically a boxplot.
Looks like the ozone concentrations are higher in garden B than in garden A.
1