Descriptive statistics
Discrete or continuous data?
1) Categorical
Has two or more categories with no ordering to them e.g. hair colour, job title
2) Discrete (usually ordinal, ratio or interval variables)
Has a fixed value with a logical order e.g. shoe size, score out of 10
3) Continuous (usually ratio or interval variables)
Can take any fractional value e.g. reaction times
Frequency distributions
Categorical data – can be presented as its raw frequency or as a percentage frequency e.g. what's
your least favourite a level subject? Shown in a bar chart of the number / percentage of students
according to each subject
Discrete data – can be presented as a cumulative frequency or percentage e.g. how did students
score on a test? If there are lots of values use frequency ranges to present this instead e.g. score 1-
2 , 3-4 etc
Measures of central tendency
Sometimes we want to condense the entire frequency distribution to a single
number
This is where we might calculate the central tendency of the data
Mode – the most frequently occurring score in a dataset
Median- the middle score in a dataset
Mean – sum of data points/ number of data points
Mode
- Most common score
- Can be used for nominal data
- Sometimes takes more than one value (bimodal and multimodal distributions)
Median
- The middle value in a dataset, or the mean of the middle two values
- E.g. in this data there are 89 participants
- Median value = value number 45 – 45 th value = 8
- Pros – insensitive to outliers, often gives a real / meaningful data value, useful for ordinal
data and skewed interval/ratio data
- Cons – ignores a lot of the data, difficult to calculate without a computer, can’t use this with
nominal data
Mean
- Sum of values divided by number of data points
- E.g. sum of data points =600, number of data points = 89 – 600/89= 6.74
- Students scored on average a 6.74 out of 10 on this test
- Pros – uses all of the data, is most effective for normally distributed datasets
- Cons – sensitive to outliers, values are not always meaningful (we can’t get a score of 6.74
out of 10), only meaningful for ratio and interval data
Discrete or continuous data?
1) Categorical
Has two or more categories with no ordering to them e.g. hair colour, job title
2) Discrete (usually ordinal, ratio or interval variables)
Has a fixed value with a logical order e.g. shoe size, score out of 10
3) Continuous (usually ratio or interval variables)
Can take any fractional value e.g. reaction times
Frequency distributions
Categorical data – can be presented as its raw frequency or as a percentage frequency e.g. what's
your least favourite a level subject? Shown in a bar chart of the number / percentage of students
according to each subject
Discrete data – can be presented as a cumulative frequency or percentage e.g. how did students
score on a test? If there are lots of values use frequency ranges to present this instead e.g. score 1-
2 , 3-4 etc
Measures of central tendency
Sometimes we want to condense the entire frequency distribution to a single
number
This is where we might calculate the central tendency of the data
Mode – the most frequently occurring score in a dataset
Median- the middle score in a dataset
Mean – sum of data points/ number of data points
Mode
- Most common score
- Can be used for nominal data
- Sometimes takes more than one value (bimodal and multimodal distributions)
Median
- The middle value in a dataset, or the mean of the middle two values
- E.g. in this data there are 89 participants
- Median value = value number 45 – 45 th value = 8
- Pros – insensitive to outliers, often gives a real / meaningful data value, useful for ordinal
data and skewed interval/ratio data
- Cons – ignores a lot of the data, difficult to calculate without a computer, can’t use this with
nominal data
Mean
- Sum of values divided by number of data points
- E.g. sum of data points =600, number of data points = 89 – 600/89= 6.74
- Students scored on average a 6.74 out of 10 on this test
- Pros – uses all of the data, is most effective for normally distributed datasets
- Cons – sensitive to outliers, values are not always meaningful (we can’t get a score of 6.74
out of 10), only meaningful for ratio and interval data