INCLUDES Q&A lectures, web lectures and seminar notes from week 1-7 (Total: 45 pages).
1
Statistics I: Description and Inference Lecture and Seminar Notes
(Week 1-7)
Table of Contents
Week 1 2
Workgroup Session #1 9
Week 2 10
Workgroup Session #2 15
Week 3 16
Workgroup Session #3 22
Week 4 23
Workgroup Session #4 29
Week 5 31
Workgroup Session #5 36
Week 6 36
Workgroup Session #6 39
Week 7 40
Workgroup Session #7 45
, 2
Week 1
Introduction
Why statistics?
1. Make sense of collected data.
2. Discover patterns and (causal) relationships.
3. Possibilities and limitations of sampling.
4. Critical reflection on existing research.
5. Become critical observers of news.
4 Golden Rules:
1. Read (and reread) the chapters.
2. Watch the Web lectures and attend the Q&A lectures.
3. Attend and participate in all seminars.
4. Complete all the assignments and practise with materials.
Levels of Measurement
Variables
Constant: If it does not vary.
Variable: Anything that can be measured and can differ across entities or across time (e.g. hair colour).
Independent Variable has an effect on Dependent Variable
Cause (often written as x). → Outcome (often written as y).
Levels of Measurement
Variables have different scales/levels of measurement, referring to the nature of information within
values assigned to variables.
Categorical: Contain a finite number of categories or distinct groups.
1. Nominal 2. Ordinal
Two or more exclusive categories. Clear ordering of the values (e.g. small or larger).
No natural order. Spacing between the values is NOT the same across
levels.
No arithmetic operations possible (subtraction or
logical operations). Comparison is possible, but only relative.
Can only talk about these categories in frequency E.g. level of agreement.
(mode).
E.g. political party affiliation.
Continuous: Continuous variables are numeric variables that have an infinite number of values
, 3
between any two values (i.e. the difference between two values are meaningful). These variables are
continuous, BUT can also be discrete.
➔ A “continuous” interval-ratio variable can be measured to any level of precision (e.g.
height can be measured to any value).
➔ A “discrete” interval-ratio variable can only take certain, countable values, usually whole
numbers (e.g. points in an exam).
3. Interval 4. Ratio
The zero is arbitrary or meaningless. Like interval variables, but have a meaningful zero.
E.g. a temperature of 0.0°C to °F does not mean ‘no E.g. 0 Kelvin means no heat.
heat’.
Distributions and Measures of Central Tendency
Distribution
When data is collected, it can be shown how data values are distributed in relation to other values.
➔ Frequency Distribution: The distribution of statistical data set to show all the possible values
(or intervals) of the data and how frequently they occur.
◆ E.g. the European Social Survey
1. Nominal variable of religion or denomination at the time of interview
(specific categories, but no natural ordering of these values).
2. Ordinal variable level of interest in politics at the time of interview (a
natural ordering of these variables - very/quite/hardly/not interested).
3. Ratio variable of the age at the time of interview.
Describing Different Distributions
Measure of Central Tendency: A value that attempts to describe a set of data by identifying the central
position within that set of data.
➔ Mode: The most frequent score in a data set. There can be several modes.
➔ Median: The middle score for a set of data that has been arranged in order of magnitude. It is
not affected by extreme values. With an even number of scores, add the two middle numbers
together and divide by 2:
𝑛1+𝑛2
𝑚𝑒𝑑𝑖𝑎𝑛 = 2
➔ Mean: The average of the numbers. The mean is sensitive to extreme values/outliers.
Therefore, when there are extreme values in the data set, the median may be more useful than
the mean. Here:
◆ Σ: Sigma; the “sum of...”.
◆ 𝑋: The “mean” (the bar) of variable x.
𝑛
◆ ∑ 𝑥1: Calculate the sum of all values of x (x1, x2, x3, ... xn)
𝑖=1
◆ 𝑛: The total number of observations (n).
𝑛
∑ 𝑥𝑖
𝑖=1
𝑋= 𝑛