Table of contents
Data exploration
Multivariate dataset
Metric vs non-metric measurement
Nonmetric data
Metric data
Data visualization
Boxplot
Scatterplot
Multivariate outliers
Mahalanobis Distance (MD)
Checking for normality
Missing data
Missing Completely At Random (MCAR)
Missing At Random (MAR)
Not Missing At Random (NMAR)
Exploratory Factor Analysis (EFA)
The factor model
Finding the right factor model
The number of factors
Factor loadings
Factor rotation
Factor reliability
Factor validity
Reporting of EFA
The regression model
Ordinary Least Squares (OLS)
Assumptions in linear regression
Linearity (assumption 1)
Model comparison
Normally distributed error term (assumption 2)
Homoskedasticity (assumption 3)
Exogeneity (assumption 4)
Multicollinearity (no assumption)
Interesting cases
An explanatory variable is non-metric
1ZM31 - course summary 1
, An explanatory variable is the output of a factor analysis
The effect of one variable depends on another variable
The logit model
Linear Probability Model
The logit model
Including another dummy variable
Maximum Likelihood
Logit model fit
Akaike Information Criterion (AIC)
HIT rate
Sources of uncertainty
More about interpretation
Structural Equation Modeling (SEM)
Data exploration
Multivariate dataset
→ several variables are measured for each unit of analysis.
q variables
n units of analysis
Fits in a rectangle
Rows/columns can be shuffled
Columns/variables can be shuffled
An example of a multivariate dataset.
Metric vs non-metric measurement
Types of data.
Nonmetric data Metric data
1ZM31 - course summary 2
, Nominal scales → no ordering Interval scales → no meaningful absolute zero
Dummy variable (0/1) e.g. temperature → 10°C is not twice as
warm as 5°C
e.g. EU citizen: yes/no
Ratio scales → meaningful absolute zero
Categorical variable
e.g. height of a person
e.g. gender: nonbinary/female/male
e.g. the number of employees
e.g. transportation: bike/foot/car
Ordinal scales → ordering
e.g. education: high school/bachelor/PhD
Data visualization
→ to get a feel for the data.
What is measured?
What are “normal” values?
How much variation is there in the data?
Are there groups in the data?
Boxplot
Example of a boxplot.
Scatterplot
1ZM31 - course summary 3