05/02/2025 - Lecture 1
There is an overlap between quantitative and qualitative data → when you carry out statistical
research both are essential
With statistical techniques you need to be able to make certain decisions. This course is about
reflective ways of thinking about research.
Multivariate analysis: what is useful in complex research with at least three (or more) variables. This
course will cover techniques that are usable for small scale research with a limited number of research
participants. Usually more data is better, but there are techniques that explore within small data sets.
MCA (multiple correspondence analysis) & CATPCA: explore relationships with multiple variables in
small data sets. Multiple regression & PCA → most used statistical multivariate techniques right now,
principal component analysis technique is a good introduction to prepare for multivariate analysis.
Two variables → statistical relationships. Significant means it's relevant.
Values of variables are seen within plotted multiple correspondence analysis
Political preference: nominal
Educational level: ordinal
Social distance: scale
→ technique used can help plot as many variables together as you want (when too many variables are
put together it can be chaotic/overwhelming)
Statistical relationship: when you have higher income and higher level of educations, when you make
a plot these values are close together, scored together in data sets (Bivariate test)
→ When values are close together, they are more often plotted close together
Most of the exam will be multiple choice, closed book exam. You need pass grade for exam and
assignments together. Everything is group assignments, except for exam.
→ Examples will be given during the course
For each assignments you need to make data analysis plan: plan of steps you take during the data
analysis, also you need to make clear what each author contributed to the assignments.
Question about assignment? → send email and include data set → discussed during Q&A
Significant statistical relationship – Important statistical relationship: just because something comes
out as a significant statistical relationship it does not mean it is important or relevant.
Positive statistical correlation: grade goes up – year goes up (example) – > this is a relevant
correlation.
𝘱-value is not always 0.05: depends on question/context of your study
Ex.: Test whether medicine is safe (for a cold), → 𝘱-value should be small, because you do
not want people to be sick
Ex.: Test for medicine for last phase of life, → 𝘱-value should be 0.05, because context
MCA/CATPCA: can still be used with 30 research participants
You need to check/explore statistical relationship between variables (bivariate) before doing
assignment 1.
,Refresh knowledge on SPSS; links on BS
Recap Statistics
Measurement/analysis level
● Nominal: values no particular order, one is not better/higher than the other
● Ordinal: there is an order, but no specific steps as how much higher A is than B
● Interval: There is an order with specific steps, such as steps of meters etc. (distances between
values have meaning)
● Ratio: more than two different categories, ..
● When you select statistical test to explore relationships between statistical values, the test that
you select depends on measurement type. Ordinal; kendall’s tau, interval; pearson’s R, etc.
● Always use the test with most information, on highest level if possible
○ Level of education/income: pearsons, kendall’s tau
Assumptions
● Parametric test seen as more powerful
○ They are based on normal distribution assume
○ Linearity & additivity; age & income you assume that it is lineair (so when you are
younger you earn less and older earn more)
○ Normality
○ Homogeneity & Variance (homoscedasticity)
Variance & covariance
● The variance tells us by how much scores deviate from the mean for a single variable.
○ We need to see whether as one variable increases, the other increases, decreases or
stays the same in this can be done by calculating the covariance
● Covariance depends on the units of measurement → that's the problem with it
○ Solution: divide the standard deviations of both variables.
○ The standardized version of covariance is known as the correlation coefficient
● Importance of standardization: Example; you have to use different currency, so you compare
cost to how much it would cost at currency you're used to. → Compare things to see
What test?
● Regression: directional relationship
○ Test for statistical relationship on interval level. Difference with pearson R is that
regression also provides with model that you can use to think about whether one
variable might be dependent on another which is useful for exploring causal
relationships. To be able to say something about causation; you can never really use
regression to prove causation, you need a theory or experiment.
● Correlation: relationship
● Nonparametric: data where you don't have to make assumptions, so smaller data sets.
● Chi-square test: association between two variables
● T-test: differences between groups
● Parametric vs Robust vs nonparametric tests
, Assignment 1
Create your own tables, do not copy and paste from SPSS, check your lay-out → this means do not
include unnecessary info that SPSS includes. You can assume that the outsider is someone in
academia, but is not familiar with how these techniques work.
● Preferably 5 or more variables
● Minimum of 2 research questions
● Make opening statement of how many people in your research and relevant demographics,
introduce reader to people you interviewed, here you can use graphs/frequency tables but only
use them when necessary (when you can do it in a sentence, do that instead)
● You’re raw data should be in your data set
07/02/2025 - Lecture 2
Multiple response questions using in this course → splitting question into several variables
!!!!!!! We kunnen dan bijvoorbeeld kijken naar een paar opties uit die vraag en die opties bijvoorbeeld
op ja/nee zetten per optie. Kijk naar vraag: What kind of activities does your community project
regularly organize?
You can use the same variables for multiple questions.
Other, namely → this is still a quantifiable category, but you didnt put value/meaning to this category.
Your options:
- Stay with other
- If many people filled out this value and specified, then it might be interesting to explore what
other reasons people mentioned. If you find that more than 4-5 gave the same reason, you can
create new category. The advantage of this is that you include more information this way.
Issue of Recoding
- You have to think about how you are recoding and why, and it depends on your research
questions and influences them
- Think about how your research question might help you design a recode strategy that is
meaningful
- Sometimes depending on your research question and you have small data set, you can
combine categories within question. F.E. protestant, catholic → christian.
- Most statistical tests do not work when you have too many categories. When you only have
small group that have certain characteristics, it cannot say something about general.
- When you recode, do it in a different variable (Transform → Recode into different variables)
If you recode in same variables, you lose all your raw data
Multivariate analysis: into (1)
● More than two variables
● Statistical relationships between variables:
○ What it means when there is a relationship between variables → when there is a
pattern
○ Example:
When studying group of people: education (low/high), income (low/high)
There is an overlap between quantitative and qualitative data → when you carry out statistical
research both are essential
With statistical techniques you need to be able to make certain decisions. This course is about
reflective ways of thinking about research.
Multivariate analysis: what is useful in complex research with at least three (or more) variables. This
course will cover techniques that are usable for small scale research with a limited number of research
participants. Usually more data is better, but there are techniques that explore within small data sets.
MCA (multiple correspondence analysis) & CATPCA: explore relationships with multiple variables in
small data sets. Multiple regression & PCA → most used statistical multivariate techniques right now,
principal component analysis technique is a good introduction to prepare for multivariate analysis.
Two variables → statistical relationships. Significant means it's relevant.
Values of variables are seen within plotted multiple correspondence analysis
Political preference: nominal
Educational level: ordinal
Social distance: scale
→ technique used can help plot as many variables together as you want (when too many variables are
put together it can be chaotic/overwhelming)
Statistical relationship: when you have higher income and higher level of educations, when you make
a plot these values are close together, scored together in data sets (Bivariate test)
→ When values are close together, they are more often plotted close together
Most of the exam will be multiple choice, closed book exam. You need pass grade for exam and
assignments together. Everything is group assignments, except for exam.
→ Examples will be given during the course
For each assignments you need to make data analysis plan: plan of steps you take during the data
analysis, also you need to make clear what each author contributed to the assignments.
Question about assignment? → send email and include data set → discussed during Q&A
Significant statistical relationship – Important statistical relationship: just because something comes
out as a significant statistical relationship it does not mean it is important or relevant.
Positive statistical correlation: grade goes up – year goes up (example) – > this is a relevant
correlation.
𝘱-value is not always 0.05: depends on question/context of your study
Ex.: Test whether medicine is safe (for a cold), → 𝘱-value should be small, because you do
not want people to be sick
Ex.: Test for medicine for last phase of life, → 𝘱-value should be 0.05, because context
MCA/CATPCA: can still be used with 30 research participants
You need to check/explore statistical relationship between variables (bivariate) before doing
assignment 1.
,Refresh knowledge on SPSS; links on BS
Recap Statistics
Measurement/analysis level
● Nominal: values no particular order, one is not better/higher than the other
● Ordinal: there is an order, but no specific steps as how much higher A is than B
● Interval: There is an order with specific steps, such as steps of meters etc. (distances between
values have meaning)
● Ratio: more than two different categories, ..
● When you select statistical test to explore relationships between statistical values, the test that
you select depends on measurement type. Ordinal; kendall’s tau, interval; pearson’s R, etc.
● Always use the test with most information, on highest level if possible
○ Level of education/income: pearsons, kendall’s tau
Assumptions
● Parametric test seen as more powerful
○ They are based on normal distribution assume
○ Linearity & additivity; age & income you assume that it is lineair (so when you are
younger you earn less and older earn more)
○ Normality
○ Homogeneity & Variance (homoscedasticity)
Variance & covariance
● The variance tells us by how much scores deviate from the mean for a single variable.
○ We need to see whether as one variable increases, the other increases, decreases or
stays the same in this can be done by calculating the covariance
● Covariance depends on the units of measurement → that's the problem with it
○ Solution: divide the standard deviations of both variables.
○ The standardized version of covariance is known as the correlation coefficient
● Importance of standardization: Example; you have to use different currency, so you compare
cost to how much it would cost at currency you're used to. → Compare things to see
What test?
● Regression: directional relationship
○ Test for statistical relationship on interval level. Difference with pearson R is that
regression also provides with model that you can use to think about whether one
variable might be dependent on another which is useful for exploring causal
relationships. To be able to say something about causation; you can never really use
regression to prove causation, you need a theory or experiment.
● Correlation: relationship
● Nonparametric: data where you don't have to make assumptions, so smaller data sets.
● Chi-square test: association between two variables
● T-test: differences between groups
● Parametric vs Robust vs nonparametric tests
, Assignment 1
Create your own tables, do not copy and paste from SPSS, check your lay-out → this means do not
include unnecessary info that SPSS includes. You can assume that the outsider is someone in
academia, but is not familiar with how these techniques work.
● Preferably 5 or more variables
● Minimum of 2 research questions
● Make opening statement of how many people in your research and relevant demographics,
introduce reader to people you interviewed, here you can use graphs/frequency tables but only
use them when necessary (when you can do it in a sentence, do that instead)
● You’re raw data should be in your data set
07/02/2025 - Lecture 2
Multiple response questions using in this course → splitting question into several variables
!!!!!!! We kunnen dan bijvoorbeeld kijken naar een paar opties uit die vraag en die opties bijvoorbeeld
op ja/nee zetten per optie. Kijk naar vraag: What kind of activities does your community project
regularly organize?
You can use the same variables for multiple questions.
Other, namely → this is still a quantifiable category, but you didnt put value/meaning to this category.
Your options:
- Stay with other
- If many people filled out this value and specified, then it might be interesting to explore what
other reasons people mentioned. If you find that more than 4-5 gave the same reason, you can
create new category. The advantage of this is that you include more information this way.
Issue of Recoding
- You have to think about how you are recoding and why, and it depends on your research
questions and influences them
- Think about how your research question might help you design a recode strategy that is
meaningful
- Sometimes depending on your research question and you have small data set, you can
combine categories within question. F.E. protestant, catholic → christian.
- Most statistical tests do not work when you have too many categories. When you only have
small group that have certain characteristics, it cannot say something about general.
- When you recode, do it in a different variable (Transform → Recode into different variables)
If you recode in same variables, you lose all your raw data
Multivariate analysis: into (1)
● More than two variables
● Statistical relationships between variables:
○ What it means when there is a relationship between variables → when there is a
pattern
○ Example:
When studying group of people: education (low/high), income (low/high)