WEEK 1
Introductory lecture (05.11.24)
Learning objectives for week 1:
1. Describe, calculate and interpret measures of central tendency, dispersion and relative standing of
variables and articulate and substantiate your findings and decision(s) adequately, also by using
SPSS/R;
2. Describe, determine and interpret the shape of the distribution of variables of different
measurement levels and articulate and substantiate your findings and decision(s) adequately, also
by using SPSS/R;
3. Substantiate conclusions and decisions involved with learning goals 1 and 2 adequately
Video 1.1:
Collecting, classifying, summarizing, organizing, analyzing & interpreting numerical information
Empirical cycle → kind of like a research plan
- Statistic is related to analyzing data, a means to answer RQs & test hypothesis using numerical
information
Descriptive vs explanatory research questions
Descriptive: use words like related, correlated, difference or just characteristics not related
Explanatory: use words like determine, has an effect or influences
- Both can be either static or dynamic
- Static: refers to a moment in time
- What are the difference in average sizes between countries in 2015
- Dynamic: refers to a period
- To what extent have the differences in average size of firms between countries
changed from 1995 - 2015?
Different statistical analysis to answer different types of RQs:
● Measures for central tendency, dispersion of data
● Confidence intervals for point estimation
● Correlations
● Analysis of variance (dynamic)
● Trend analysis (dynamic)
1
, ● Regression analysis
● Non-parametric analysis
Descriptive don't really relate to theory, only when its about correlation or when u analyze
Explanatory do relate to theory & most of the analysis can be used
Video 1.2: Variables and data
Requirements for data:
1. Measurement levels of variables
a. There r two types of variables:
i. Categorical: contain categories to distinguish different scores; nominal (the
number for each category has no meaning other than distinguishing them) &
ordinal (the number for each category has a meaning; a rank score; distances
between the categories r not equal) measurement level
ii. Continuous: include different scores but can take a new score, always a new
division possible; interval (the number for each category has a meaning,
categories r ordered so the higher the nr the higher the position on the scale,
differences between the categories r equal; lacks a natural 0 point) & ratio (all
the feature of the interval but also has a natural 0 point) measurement level
1. Natural 0 point: the natural end, so like age has that bc u cant be
younger than 0)
iii. → these types r ordered: lowest one is nominal, highest one is ratio → important
bc statistical analysis techniques & parameters set the requirements for
measurement levels of the variables + also depends on ur data collection tool
Starting point for statistical analysis is datamatrix (all the information that can be analysed is stored)
Questions to ask to choose the most adequate analysis technique
1. What type of RQ needs to be answered? (descriptive/explanatory/static/dynamic)
2. What measurement level do my variables need? (nominal/ordinal/interval/ratio)
3. Are there relationships between the variables involved? If so, is it abt correlation or causal
relationship?
4. What criteria need to be met to allow a certain analysis?
→ not a fixed sequences - finding answers to these is an iterative process
2
,Video 1.3: Describing data; central tendency
U can describe variable using 3 characteristics:
1. Central tendency (center) - where is the midpoint of the variable? High or low on the scale?
2. Dispersion (variability) - how r the scores spread around that midpoint? More wide or more
close?
3. Shape - what does the distribution of scores look like? Is it symmetric or skewed?
→ not all variables can be described using these characteristics, applicability depends on the measurement
level of the variable
Ways of describing data:
1. Frequencies / frequency tables: about counting the amount of times a certain score appears in ur
data; used for all measurement levels BUT might not be useful for variables with lots of
categories
a. → can convert them into % - relative frequency (10% of participants got a grad 6 for
example)
2. Central tendency: a way of summarizing a range of scores in terms of a midpoint/center;
displayed in a histogram; hierarchically ordered
3 common measures of central tendency:
1. Mode = the score that occurs the most frequently; can be found by looking at the
frequency table
2. Median = the middle score when all the data r ordered from low to high; to determine it u
add 1 to the number of respondent and divide it by 2 → (n+1)/2
3. Mean = the average score; determined by summing up all scores & dividng it by the nr of
observations; often used to describe the;
→ not all of these measures of central tendency can be applied to variables of diff levels:
Measure central Nominal Ordinal Interval Ratio
tendency
CAN be used mode Mode, median Mode, median, Mode, median,
mean mean
MOST mode median Symmetric Symmetric
frequently used variable: mean variable: mean
Skewed variable: Skewed variable:
median median
3
, Video 1.4: Describing data - dispersion
Measures of dispersion r about how the variable r spread around the center of the variable - only useful
when the scores range from low to high, only ordinal, interval or ratio levels
*add the table*
3 measures of dispersion:
1. Range = highest score minus the lowest, the most basic
2. Inter quartile range (iqr) = the range in which the middle 50% of the scores lies: it is the
distance between the upper quartile (Qu) and the lower quartile (Ql); relates to the median;
25& of the observation above and 25% below the median and adding them give u the
middle 50%
3. Standard deviation (s) = the average difference between he scores & the mean; the mean
differences between the scores & the mean in terms of the original scale and not squared;
calculated from the variants and/or sample data; from each score u subtract the mean, u square
the difference & add all the outcomes then u divide this by the total number of observation
minus 1; outcome is the variance; tells u how close or widely the scores r spread around the
mean - the higher the deviation the more widely they r spread
Measure of Nominal Ordinal Interval Ratio
dispersion
CAN be used None Range, iqr Range, iqr, Range, iqr,
standard deviation standard deviation
MOST None iqr Symmetric Symmetric
frequently used variable: standard variable: standard
deviation deviation
Skewed variable: Skewed variable:
iqr iqr
→ measure for lower level variables can be used for the higher level variables but not the other way
around
How to determine the level of dispersion?
- Most of the time the range is the least informative level
4