Methods, Measurement and Statistics
Lecture 1 Statistics and Measurement
Methods
Design a study that can answer your research question.
Measurement
How to measure social and psychological constructs.
Statistics
How to describe and analyze your data and test hypotheses.
Statistics is used to:
- Describe/summarize data (descriptive statistics)
Reduce data to understandable pieces of information
- Drawing inferences about populations (inferential statistics)
In science we often want to draw conclusions about populations
- Studying complex multivariate relationships (statistical modeling)
Measurement levels
Quantitative data is expressible in numbers often collecting it using questionnaires.
Basic distinction between four types of data (measurement levels):
- Nominal
Numbers express different unordered categories or groups (eye colour)
Each category gets a number, but the number doesn’t have a meaning and is just for
reference.
Example: marital status:
1 single
2 married
3 relationship but not married
4 complicated not specified otherwise
Categories must be exhaustive (all possibilities should be covered) and mutually exclusive
(every case fits into one category and one category only). You want to prevent people not
being able to answer the question and remove options that can be answered on top of other
options that they can choose.
- Ordinal
Numbers express different ordered categories (less/more)
Example: Smoking intensity
1 never
2 at least 1 cigarette per month
3 at least 1 per day
4 five or more per day
,There is a clear order and increase in the options, there is still a category and no exact
amount mentioned and you are forced to respond with one of these categories.
Ordinal variables express more or less of a quantity but the difference between pairs of
categories is not necessarily the same in quantity.
There should be a logical order. See below, can be interpreted in different ways.
Example: Not logical:
1 Never
2 Occasionally
3 Daily
4 Often
- Interval
Numbers express differences in quantity using a common unit with equal intervals between
the neighboring data points, but no true zero point (true interpretation of a zero).
Example: IQ test score (Temperature is also an example, 0 degrees doesn’t mean that there is
no temperature, it just means that the water starts to freeze)
The difference between 70 and 80 IQ points is comparable to a difference between 100 and
110. Both span a difference of 10 units.
The IQ test, doesn’t have a true meaning of a 0. There is no absence of it.
- Ratio
Numbers express differences in quantity on a common unit and have a natural zero point.
Example: length, weight or income
A length, weight or income of 0 can be meaningfully interpreted.
This allows for relative comparisons.
For example 6 degrees is not twice as hot as 3 degrees (because zero degrees isn’t the
starting point) but someone can be twice as long.
They differ in how refined or exact the measurement is:
- Nominal lowest level and ratio highest
- Measuring at a lower level is often easer but less informative.
Interval and ratio level data are scale data. All variables that are not nominal or ordinal are
treated as scale-level variables.
Ratio is more precise because it can difference more things and more specific properties.
Measurement level is a property of the measurement values, it is not an intrinsic property of
the thing you are measuring.
Example: you cannot say that intelligence has interval level
Intelligence can be measured at different levels depending on the measurement instrument.
Nominal: variable indicating someone’s intelligence type (musical/ mathematical)
Ordinal: variable indicating the highest education completed (e.g. primary school)
Interval: score resulting from an IQ test
Ratio: skull circumference in centimeters (used to do this in the past)
,Measurement levels determine the kind of statistics and statistical analyses you can use
meaningfully.
Data inspection
Every analysis starts with data inspection: the goal is to get a clear picture of the data by
examining one variable at the time (univariate), or pairs of variables (bivariate).
In general we want to inspect:
- Central tendency: what are the most common values?
- Variability: how large are the differences between the subjects? Are there extreme
values in the sample? (e.g. an age of 100)
- Bivariate Association: for each pair of variables, do they associate/covary/correlate
(do low/large values on variable A go together with low/large values on variable B)
Accomplishing this goal:
- Visual data inspection (graphs)
- Numerical data inspection (statistics)
Which statistics and graphs are most appropriate depends on the measurement level.
Visual data inspection (three common graph types)
- Bar charts (nominal and ordinal data)
- Histogram (scale data)
In a histogram not every category has a bar such as bar charts (e.g. some may be 0)
Normal distribution: symmetrical distribution, the farther you go from the center to
the edges the lower the distribution is and gradually lowers. For example: IQ score,
birthweight, length.
- Scatterplot
, Scale data and 2 variables at the same time
Important information that is not shown in the graph, can lead to misleading figures
and incorrect conclusions (such as not showing age in a graph about length and
reading ability)
Numerical data inspection
Three common statistical approaches
1. Frequency tables
how often do particular scores occur?
1 variable
Valid percent = frequency/ (total sample size (N) – missings)
- Crosstable: 2 variables
2. Central tendencies
The center in the scores of your data
- Mode
The score that is observed most frequently.
Example: {3,4,5,5,5} -> mode is 5
For nominal, ordinal or scale data
- Median
The score that separates the higher half of data from the lower half of data, the exact
middle score.
Example 1: N= unequal {5,6,7,8,9} -> median is 7
Example 2: N=equal {5,6,8,9} median -> 7
Arithmetic mean of the two middle values 6 and 8 = 7
For ordinal or scale data that are not normally distributed
- Mean
The average score of all the total scores.
X = the score that you want to calculate the mean for
N = number of scores that you have (how many times is there a score)
Lecture 1 Statistics and Measurement
Methods
Design a study that can answer your research question.
Measurement
How to measure social and psychological constructs.
Statistics
How to describe and analyze your data and test hypotheses.
Statistics is used to:
- Describe/summarize data (descriptive statistics)
Reduce data to understandable pieces of information
- Drawing inferences about populations (inferential statistics)
In science we often want to draw conclusions about populations
- Studying complex multivariate relationships (statistical modeling)
Measurement levels
Quantitative data is expressible in numbers often collecting it using questionnaires.
Basic distinction between four types of data (measurement levels):
- Nominal
Numbers express different unordered categories or groups (eye colour)
Each category gets a number, but the number doesn’t have a meaning and is just for
reference.
Example: marital status:
1 single
2 married
3 relationship but not married
4 complicated not specified otherwise
Categories must be exhaustive (all possibilities should be covered) and mutually exclusive
(every case fits into one category and one category only). You want to prevent people not
being able to answer the question and remove options that can be answered on top of other
options that they can choose.
- Ordinal
Numbers express different ordered categories (less/more)
Example: Smoking intensity
1 never
2 at least 1 cigarette per month
3 at least 1 per day
4 five or more per day
,There is a clear order and increase in the options, there is still a category and no exact
amount mentioned and you are forced to respond with one of these categories.
Ordinal variables express more or less of a quantity but the difference between pairs of
categories is not necessarily the same in quantity.
There should be a logical order. See below, can be interpreted in different ways.
Example: Not logical:
1 Never
2 Occasionally
3 Daily
4 Often
- Interval
Numbers express differences in quantity using a common unit with equal intervals between
the neighboring data points, but no true zero point (true interpretation of a zero).
Example: IQ test score (Temperature is also an example, 0 degrees doesn’t mean that there is
no temperature, it just means that the water starts to freeze)
The difference between 70 and 80 IQ points is comparable to a difference between 100 and
110. Both span a difference of 10 units.
The IQ test, doesn’t have a true meaning of a 0. There is no absence of it.
- Ratio
Numbers express differences in quantity on a common unit and have a natural zero point.
Example: length, weight or income
A length, weight or income of 0 can be meaningfully interpreted.
This allows for relative comparisons.
For example 6 degrees is not twice as hot as 3 degrees (because zero degrees isn’t the
starting point) but someone can be twice as long.
They differ in how refined or exact the measurement is:
- Nominal lowest level and ratio highest
- Measuring at a lower level is often easer but less informative.
Interval and ratio level data are scale data. All variables that are not nominal or ordinal are
treated as scale-level variables.
Ratio is more precise because it can difference more things and more specific properties.
Measurement level is a property of the measurement values, it is not an intrinsic property of
the thing you are measuring.
Example: you cannot say that intelligence has interval level
Intelligence can be measured at different levels depending on the measurement instrument.
Nominal: variable indicating someone’s intelligence type (musical/ mathematical)
Ordinal: variable indicating the highest education completed (e.g. primary school)
Interval: score resulting from an IQ test
Ratio: skull circumference in centimeters (used to do this in the past)
,Measurement levels determine the kind of statistics and statistical analyses you can use
meaningfully.
Data inspection
Every analysis starts with data inspection: the goal is to get a clear picture of the data by
examining one variable at the time (univariate), or pairs of variables (bivariate).
In general we want to inspect:
- Central tendency: what are the most common values?
- Variability: how large are the differences between the subjects? Are there extreme
values in the sample? (e.g. an age of 100)
- Bivariate Association: for each pair of variables, do they associate/covary/correlate
(do low/large values on variable A go together with low/large values on variable B)
Accomplishing this goal:
- Visual data inspection (graphs)
- Numerical data inspection (statistics)
Which statistics and graphs are most appropriate depends on the measurement level.
Visual data inspection (three common graph types)
- Bar charts (nominal and ordinal data)
- Histogram (scale data)
In a histogram not every category has a bar such as bar charts (e.g. some may be 0)
Normal distribution: symmetrical distribution, the farther you go from the center to
the edges the lower the distribution is and gradually lowers. For example: IQ score,
birthweight, length.
- Scatterplot
, Scale data and 2 variables at the same time
Important information that is not shown in the graph, can lead to misleading figures
and incorrect conclusions (such as not showing age in a graph about length and
reading ability)
Numerical data inspection
Three common statistical approaches
1. Frequency tables
how often do particular scores occur?
1 variable
Valid percent = frequency/ (total sample size (N) – missings)
- Crosstable: 2 variables
2. Central tendencies
The center in the scores of your data
- Mode
The score that is observed most frequently.
Example: {3,4,5,5,5} -> mode is 5
For nominal, ordinal or scale data
- Median
The score that separates the higher half of data from the lower half of data, the exact
middle score.
Example 1: N= unequal {5,6,7,8,9} -> median is 7
Example 2: N=equal {5,6,8,9} median -> 7
Arithmetic mean of the two middle values 6 and 8 = 7
For ordinal or scale data that are not normally distributed
- Mean
The average score of all the total scores.
X = the score that you want to calculate the mean for
N = number of scores that you have (how many times is there a score)