techniques
Learning outcomes for this lecture
1. Understand what nonparametric statistical techniques are and when they are
useful
• Know the difference between parametric and nonparametric techniques
2. Understand the different data types and be able to classify any data into its type
3. Know how to rank a sample of data values
• Understand how to deal with “ties” in the data values
Parametric techniques
• The statistical techniques introduced up to this point have all required that we
know the underlying distribution of our sample (typically normal)
• or that the sample size is sufficient to rely on the normality conclusion of the
central limit theorem.
• Based on these assumptions, the sampling distribution of test statistics were
• derived and we made inferences about unknown parameters.
• Also only considered quantitative data
• Examples: Estimating the population mean μ when the population variance σ2 is
unknown and sample size n < 30 OR hypothesis tests for σ2. Both ASSUME that
the underlying population distribution is normally distributed
• If you know the distribution of your data set, parametric tests are more powerful
(chance of making a type 2 error is less) – only more efficient when the
assumptions of the test are met
H0 true H0 false
Rejected Type 1 error à a (1-b)
Not rejected (1-a) Type 2 error à b
,Non-parametric techniques
• Make only weak assumptions about the underlying distribution of the data, e.g.
might assume symmetry of the distribution
• Do not refer to any specific parameter of a population’s distribution.
• Independence of observations and random samples are still required.
• Most of the test statistics do not depend on actual numerical values - they
• are performed on ranks of the data. Therefore, the actual specific distribution of
the data does not matter and hence these tests are also called distribution-free
statistics.
• Are used for non-normal (or when one has doubt about the precise distribution
of) quantitative, nominal, and ordinal (ranked categories) data.
Note:
• If we know we have normally distributed quantitative data we may still use
nonparametric techniques to analyse our data.
• However, in these cases we usually prefer to use parametric tests as they have
more statistical power, since we know the appropriate model to use
• for the distribution.
• The power of a statistical test is the probability that the test will reject the null
hypothesis when the alternative hypothesis is true (i.e. the probability of not
committing a Type II error)
o i.e. Nonparametric tests are always valid. Though in some circumstances
they are not optimal in power.
Review: data types
,Data types
• Nominal data can be placed into categories, but these categories do not have a
natural order.
• Ordinal data can be placed into categories and can be ranked with respect to
some characteristic. However, we cannot interpret the difference between
ranked values as numbers used are arbitrary.
• Interval-scale data has all the characteristics of ordinal data AND the
differences between any two values have meaning. However, there is no
“absolute” zero point on the scale and thus ratios between values are not
meaningful.
• Ratio-scale data has all the characteristics of interval-scale data AND has a true
zero point as its origin. Hence, ratios between values are meaningful
Interval vs ratio
• Is time an interval or ratio variable?
• Time has a natural order
• Differences between points in time are equal, so difference between 12pm and
1pm is the same as 4pm and 5pm (i.e. one hour).
• BUT time does not have a meaningful zero, there is not zero time point
• So can’t say 2pm is twice as “old of a time” than 1pm – NOT MEANINGFUL!
• Ratio is not meaningful!
• What about duration or length of time, e.g. seconds?
• There is a natural order, so 1 second is smaller than 2 seconds
• Equal distances between points, so the difference between 10 and 15 seconds
is the same as between 5 and 10 seconds.
• AND it has a true zero, 0 seconds is meaningful and thus, so are ratios (10
seconds is twice as long as 5 seconds)
• Go through the same reasoning for temperature in Celsius vs in Kelvin
Data types - Example
Determine the type of data for the following
1) The number of students in a statistics class
o Quantitative, ratio
2) The make of car driven by each of a sample of executives
o Qualitative, nominal
3) The rating (Extremely poor [1], Very Poor [2], Poor [3], Unsure [4], Good [5], Very
Good [6], Excellent [7]) reported for a particular television program by each of a
sample of viewers
o Quantitative, ordinal
4) The weekly closing price of gold throughout the year
o Qualitative, ratio
, 5) The month of highest sales for each firm in a sample
o Quantitative, ordinal
6) The socioeconomic status of people who reside in Cape Town (upper class,
middle class, lower class)
o Qualitative, ordinal
7) The responses by citizens on a 5-point rating scale (where 1=Strongly Disagree,
2=Disagree, 3=Unsure, 4=Agree, 5=Strongly Agree) to the statement: “South
Africa should be divided into two time zones”
o Qualitative, ordinal
8) The gender of UCT employees.
o Qualitative, nominal
9) The maximum temperature recorded in March 2013 (in °C)
o Quantitative, interval
10) The rating (excellent, good, fair or poor) given to a particular television program
by each of a sample of viewers
o Qualitative, ordinal
Overview of non-parametric tests