Topic 1: An introduction to statistics,
measurement and presentation of data
What do economists do?
• Provide specialist advice based on the application of economic theory and knowledge by
studying data + statistics and uncover trends by carrying out research
• Analyse the data to make recommendations of ways to improve efficiency
Types of statistics
Descriptive statistics • Provide a way of summarising + presenting data in a way
that is more meaningful than conveyed by a list of
information
Inferential statistics • Allow us to make inferences or draw conclusions from a
data set
• Allow us to infer information about a population based on a
sample of data from that population
o Estimating the impact of one economic variable on
another or predicting future economic outcomes
based on past information
Why use a sample instead of a population?
• Financial + time costs → trade off
Types of variables
• Qualitative
• Quantitative: discrete (assume certain values + gaps in between) and continuous (any
value within a range)
Measurement of data
Nominal data • Observations of a qualitative variable are measured +
recorded as labels/names
• Order has no meaning
1. Mutually exclusive: a measurement is only in one
category
2. Exhaustive: each measurement must appear in a category
Ordinal data • Order is meaningful but differences between data values
aren’t
• Can be compared
• Data classifications are represented by sets of labels with
relative values
, • Ranking or ordering of categories
Interval data • Differences between data values are meaningful
• Based on scales with known units of measurement where
distances between observations are measurable
• Value of zero doesn’t imply an absence
• Ratio doesn’t make sense (size 6 isn’t triple size 2 shoe)
Ratio data • Ratios + differences are meaningful
• Inherent zero starting point
o Weight, age, income
Presentation of data
Frequency • Groups the data into mutually exclusive organised classes
distribution table
1. How to decide number of classes – “2 to the k rule”
• Choose smallest number of k such that 2k > N (N is the total
number of observations)
2. Determine class intervals
• Need to be at least as great as range ÷ k → round up
3. Set individual class limits
• Lower limit + (k x class interval) > 48
4. Determine frequency for each class interval
5. Add relative frequencies – proportion of observations
in each class or cumulative frequencies
Graphic representation of frequency distributions
Histogram,
frequency polygon Histogram
and cumulative • Plots frequencies using bars at each class where height
frequency represents frequency
distribution
Frequency Polygon
• Attaches line segments to mid points of each class interval of
histogram to get a quick picture of main characteristics
Cumulative frequency plot
Stem-and-leaf • Each numerical value is divided into leading digits (stem) and
display trailing digits (leaf)
• Don’t lose identity of each observation
Bar chart • Doesn’t show developments over time
Line chart • Useful for showing how data evolves over time
,Pie chart • Show proportion that each class represents of the total
number of frequencies
• Useful for displaying relative frequency distributions
• Requires categories to be exhaustive
Scatter plots • Show relationship between 2 variables
• Doesn’t describe causation
, Topic 2: Describing data: measures of central
tendency and measures of dispersion
Pros Cons
Mode • Useful for • Distribution isn’t
determining where considered
there’s a clustering • Some distributions
of values have no mode –
• Not affected by uniform distribution
extremes • Some distributions
have more than one
mode
• Mode may not be
central to the
distribution of values
Median • No multiple values
• If odd number of values – median is • Not affected by
the middle extremes
• If even number of values – median • Can be computed
is the average of the 2 middles for ratio, interval
and ordinal data
Mean • Only one mean • Affected by
• Requires at least extremes
the interval scale
• Affected by any
value
Characteristics of the mean
1. If you subtract each value in dataset from the mean and add up all the differences,
the resultant sum is always zero – sum of deviations from the mean is zero
2. Least squares principle: if differences between each value of the mean are squared
and summed – the sum is the minimum possible – less than the sum of the squared
differences between each value and any other point in the distribution
Skewness
• Mean = medium: symmetrical distribution
measurement and presentation of data
What do economists do?
• Provide specialist advice based on the application of economic theory and knowledge by
studying data + statistics and uncover trends by carrying out research
• Analyse the data to make recommendations of ways to improve efficiency
Types of statistics
Descriptive statistics • Provide a way of summarising + presenting data in a way
that is more meaningful than conveyed by a list of
information
Inferential statistics • Allow us to make inferences or draw conclusions from a
data set
• Allow us to infer information about a population based on a
sample of data from that population
o Estimating the impact of one economic variable on
another or predicting future economic outcomes
based on past information
Why use a sample instead of a population?
• Financial + time costs → trade off
Types of variables
• Qualitative
• Quantitative: discrete (assume certain values + gaps in between) and continuous (any
value within a range)
Measurement of data
Nominal data • Observations of a qualitative variable are measured +
recorded as labels/names
• Order has no meaning
1. Mutually exclusive: a measurement is only in one
category
2. Exhaustive: each measurement must appear in a category
Ordinal data • Order is meaningful but differences between data values
aren’t
• Can be compared
• Data classifications are represented by sets of labels with
relative values
, • Ranking or ordering of categories
Interval data • Differences between data values are meaningful
• Based on scales with known units of measurement where
distances between observations are measurable
• Value of zero doesn’t imply an absence
• Ratio doesn’t make sense (size 6 isn’t triple size 2 shoe)
Ratio data • Ratios + differences are meaningful
• Inherent zero starting point
o Weight, age, income
Presentation of data
Frequency • Groups the data into mutually exclusive organised classes
distribution table
1. How to decide number of classes – “2 to the k rule”
• Choose smallest number of k such that 2k > N (N is the total
number of observations)
2. Determine class intervals
• Need to be at least as great as range ÷ k → round up
3. Set individual class limits
• Lower limit + (k x class interval) > 48
4. Determine frequency for each class interval
5. Add relative frequencies – proportion of observations
in each class or cumulative frequencies
Graphic representation of frequency distributions
Histogram,
frequency polygon Histogram
and cumulative • Plots frequencies using bars at each class where height
frequency represents frequency
distribution
Frequency Polygon
• Attaches line segments to mid points of each class interval of
histogram to get a quick picture of main characteristics
Cumulative frequency plot
Stem-and-leaf • Each numerical value is divided into leading digits (stem) and
display trailing digits (leaf)
• Don’t lose identity of each observation
Bar chart • Doesn’t show developments over time
Line chart • Useful for showing how data evolves over time
,Pie chart • Show proportion that each class represents of the total
number of frequencies
• Useful for displaying relative frequency distributions
• Requires categories to be exhaustive
Scatter plots • Show relationship between 2 variables
• Doesn’t describe causation
, Topic 2: Describing data: measures of central
tendency and measures of dispersion
Pros Cons
Mode • Useful for • Distribution isn’t
determining where considered
there’s a clustering • Some distributions
of values have no mode –
• Not affected by uniform distribution
extremes • Some distributions
have more than one
mode
• Mode may not be
central to the
distribution of values
Median • No multiple values
• If odd number of values – median is • Not affected by
the middle extremes
• If even number of values – median • Can be computed
is the average of the 2 middles for ratio, interval
and ordinal data
Mean • Only one mean • Affected by
• Requires at least extremes
the interval scale
• Affected by any
value
Characteristics of the mean
1. If you subtract each value in dataset from the mean and add up all the differences,
the resultant sum is always zero – sum of deviations from the mean is zero
2. Least squares principle: if differences between each value of the mean are squared
and summed – the sum is the minimum possible – less than the sum of the squared
differences between each value and any other point in the distribution
Skewness
• Mean = medium: symmetrical distribution