Statistical methods
Normal distribution :
When data I symmetrical around central scores
Mean, median and mode are equal
Data should fit along a ‘gaussian curve’
Skewed distributions:
Can have positive, negative skew and a symmetrical distribution
Calculating skew
o Can calculate Pearson’s coefficient of skew using the median and the mean:
Skew = 3(mean-median) / standard deviation
o Interpretation
If skew is <0, the data is negatively skewed
If skew is >0, the data is positively skewed
Testing for distribution
Normality tests e.g. Shapiro -Wilk, Kolomogorav – Smirnov
Simply ask: “is your data normal?” Y/N
Gaussian curve
From the mean and standard deviation of the data alone, we can predict value of y
for any value of x
This has big implication, as most of our tests are based on normal distributions
Why is the distribution shape important?
Parametric tests assume values such as the mean and standard deviation accurately
reflect the population distribution.
o 68% of the population are within (mean +/-1SD)
o 95% of the population are within (mean +/- 2 SD)
o 99.7% of the population are within (mean+/- 3SD)
Transforming data into z scores
This can help standardise data and reduce the impact of skewness
o Z= individual point – group mean / standard deviation
This tells us exactly how many standard deviations someone was from the mean
o 68% of the population are within a z score of +/- 1
o 95% of the population are within a z score of =/- 1.96 (rounded up to 2)
o 99.7 of the population are within a z score of +/- 2.96 (rounded up to 3)
Using a standardised z table – “values represent proportion to the left of the
individual score”
Pros of z scores
o Can transform data to a standardised scale
o Scale adheres to normal distribution
o Can compare things relative to their own population
o Use the entire data set
Normal distribution :
When data I symmetrical around central scores
Mean, median and mode are equal
Data should fit along a ‘gaussian curve’
Skewed distributions:
Can have positive, negative skew and a symmetrical distribution
Calculating skew
o Can calculate Pearson’s coefficient of skew using the median and the mean:
Skew = 3(mean-median) / standard deviation
o Interpretation
If skew is <0, the data is negatively skewed
If skew is >0, the data is positively skewed
Testing for distribution
Normality tests e.g. Shapiro -Wilk, Kolomogorav – Smirnov
Simply ask: “is your data normal?” Y/N
Gaussian curve
From the mean and standard deviation of the data alone, we can predict value of y
for any value of x
This has big implication, as most of our tests are based on normal distributions
Why is the distribution shape important?
Parametric tests assume values such as the mean and standard deviation accurately
reflect the population distribution.
o 68% of the population are within (mean +/-1SD)
o 95% of the population are within (mean +/- 2 SD)
o 99.7% of the population are within (mean+/- 3SD)
Transforming data into z scores
This can help standardise data and reduce the impact of skewness
o Z= individual point – group mean / standard deviation
This tells us exactly how many standard deviations someone was from the mean
o 68% of the population are within a z score of +/- 1
o 95% of the population are within a z score of =/- 1.96 (rounded up to 2)
o 99.7 of the population are within a z score of +/- 2.96 (rounded up to 3)
Using a standardised z table – “values represent proportion to the left of the
individual score”
Pros of z scores
o Can transform data to a standardised scale
o Scale adheres to normal distribution
o Can compare things relative to their own population
o Use the entire data set