Probability and distributions
Part 1
How data can be distributed
Normal distribution – Car lFredrick Gauss – 1777-1855
When data is symmetrical around central scores
Mean, median and mode are equal
Data should fit along a ‘gaussian curve’
A lot of data fit normal distribution
o Height
o Shoe size
o Birth weight
o IQ
Distributions and skew
Calculating skew can be calculated using Pearson’s coefficient of skew using the
median and the mean
o Skew = 3(mean – median)
Standard deviation
Interpretation
o If skew is <0, data = negatively skewed
o If skew is >0, data = positively skewed
Testing for distribution
o Normality tests – Shapiro-Wilk , Kolmogorov-Smirnov
o ^^they ask is your data normal? Yes or no once the data has been
implemented into these normality tests
o Or you can just plot it on a graph and eyeball it
Gaussian curve
o From the mean and SD of the data alone, we can predict the value of any x or
value of any y.
o This has big implications, as most of our tests are based on normal
distributions
o Parametric statistical tests assume that data is normally distributed
Why is distribution shape important?
o Parametric statistical tests assume values: mean and SD accurately reflect the
population distribution
Part 1
How data can be distributed
Normal distribution – Car lFredrick Gauss – 1777-1855
When data is symmetrical around central scores
Mean, median and mode are equal
Data should fit along a ‘gaussian curve’
A lot of data fit normal distribution
o Height
o Shoe size
o Birth weight
o IQ
Distributions and skew
Calculating skew can be calculated using Pearson’s coefficient of skew using the
median and the mean
o Skew = 3(mean – median)
Standard deviation
Interpretation
o If skew is <0, data = negatively skewed
o If skew is >0, data = positively skewed
Testing for distribution
o Normality tests – Shapiro-Wilk , Kolmogorov-Smirnov
o ^^they ask is your data normal? Yes or no once the data has been
implemented into these normality tests
o Or you can just plot it on a graph and eyeball it
Gaussian curve
o From the mean and SD of the data alone, we can predict the value of any x or
value of any y.
o This has big implications, as most of our tests are based on normal
distributions
o Parametric statistical tests assume that data is normally distributed
Why is distribution shape important?
o Parametric statistical tests assume values: mean and SD accurately reflect the
population distribution