Research methodology for international students
Week 1
Hypotheses and conceptual models
- Types of hypothesis
o Bivariate hypothesis: expected relationship between two variables
XY
X = independent variables (cause)
Y = dependent variables (outcome)
= direction of affection
Distinction between metric versus non-metric refers to ‘measurement
level’ of a variable
Formulating hypothesis including ordinal variables is comparable with
metric variable
Making the distinction is vital to deciding what statistics to use
Formulation of hypothesis should be consistent with this distinction
Applies to all types of hypotheses!
o Multivariate hypothesis: expected relationship between a dependent variable
Y and multiple independent variables X1…n
1. Relative importance of independent variables (multiple causality).
2. Mediation: interpretation of a relationship, the effect of the independent
variable (X1) on the dependent (Y) is indirect through its effect on the
intervening or mediating variable (X2) that in turn has an effect on the
dependent (Y).
X1 X2 Y
Partial mediation = direct + indirect effect
3. Moderating effect: interaction hypothesis, the effect of X1 on Y is the
conditional on the moderator (X2); or: the effect of X1 on Y is different
depending on the value of the moderator X2.
Conditional effect [intensifier (+) or suppressor (-) effect
X5 Y
X4
4. Spurious relationship: common cause (antecedent), explanatory
hypothesis (=explanation). An observed relationship between X1 and Y is
spurious because they share a common cause X2
X2 X1
Y
- Conceptual model
o Graphical representation of a set of logically connected hypotheses, full
picture
o Set hypotheses allow to draw the conceptual model
o from a given conceptual model one can reconstruct the original hypotheses
, causality, unit of analysis, fallacies
causality
- 3 conditions to establish causality
o Association: statistical relationship between the variables
Need to be a ‘perfect’ relationship
Often ‘weak’ relationships observed due to measurement error or
multicausality
o Direction of the relationship
Cause consequence
Independent variable influences dependent variable (not other way
around)
Sometimes obvious, for instance: characteristics that are fixed by birth
(birth year, parental background…) or events in the past
But not always, for instance does ethnocentrism influence the contact
with immigrants or does having contact with immigrants influence a
person’s level of ethnocentrism
Note: hypotheses usually define a direction of the relationship
o Nonspuriousness (or absence of spuriousness)
No extraneous variables or antecedents are allowed to explain the
relationship between the variables of interest
So: that is why taking into account the effect of antecedents is crucial
to establish causality in an empirical situation
Note: antecedents are often called control variables, the more control
variables in a model the more likely the relationship is not spurious if
still observed when the control variables are included
Unit of analysis
1. Unit of analysis
- About whom or what statements are made in the research
- Note: unit of observation may deviate from unit of analysis in a research
2. Nested data
- Multilevel data
- Combining data from different units of observation in which individuals’ cases
constitute elements of larger groups (aggregates)
- Sources of information at group level
o National and regional statistics
o Data from previous research (=external aggregation)
o Aggregating individual level data (=internal aggregation)
3. How to define the unit of analysis with multilevel data
- Define unit of analysis per variable defined in the hypothesis
- Unit of analysis to test the hypothesis is the unit of analysis that is the smallest
aggregate (with individual case as the smallest overall unit possible) (case > subgroup
> group)
- Tip: often (but not always) the dependent variable defines what the unit of analysis is
- Examples:
o The gross domestic product of countries has a positive effect on the level of
optimism of the citizens
Gross domestic product national level
, Level of optimism citizens level
So, unit of analysis is citizens
o The average satisfaction with the job within teams increases with the level
balancing of genders within teams
Average satisfaction individual information to measure average so
team level
Balancing of genders within teams
So, within teams because it both gets measured within teams
Logical fallacies
- Drawing conclusions at one level while analyzing findings at another level
1. Ecological fallacy: drawing conclusions at the individual level while analyzing the
group level
o Finding: the higher the level of unemployment in different regions the higher
the % of extreme-right voting
o Interpretation: unemployed people tend to vote more extreme right than
employed people
2. Atomistic fallacy: drawing conclusions on aggregate level from analyses of
individual level data
o Finding: the likelihood of dying from coronary heart disease decreases the
higher the income of that person
o Interpretation (reasoning at the country level): so, a decrease in morality by
coronary heart disease is linked to the wealth of the country (or in layman
language: the richer a country the less likely people die from coronary heart
disease)
Elaboration
- Elaboration (chapter 15 p. 455-462): enhance or ‘elaborating’ our understanding of a
bivariate relationship by introducing a 3th ‘control’ variable in contingency tables (or
cross tabulations). Applies to moderation, meditation and spuriousness.
- Control variable: common cause
- Simpson’s paradox
Week 1
Hypotheses and conceptual models
- Types of hypothesis
o Bivariate hypothesis: expected relationship between two variables
XY
X = independent variables (cause)
Y = dependent variables (outcome)
= direction of affection
Distinction between metric versus non-metric refers to ‘measurement
level’ of a variable
Formulating hypothesis including ordinal variables is comparable with
metric variable
Making the distinction is vital to deciding what statistics to use
Formulation of hypothesis should be consistent with this distinction
Applies to all types of hypotheses!
o Multivariate hypothesis: expected relationship between a dependent variable
Y and multiple independent variables X1…n
1. Relative importance of independent variables (multiple causality).
2. Mediation: interpretation of a relationship, the effect of the independent
variable (X1) on the dependent (Y) is indirect through its effect on the
intervening or mediating variable (X2) that in turn has an effect on the
dependent (Y).
X1 X2 Y
Partial mediation = direct + indirect effect
3. Moderating effect: interaction hypothesis, the effect of X1 on Y is the
conditional on the moderator (X2); or: the effect of X1 on Y is different
depending on the value of the moderator X2.
Conditional effect [intensifier (+) or suppressor (-) effect
X5 Y
X4
4. Spurious relationship: common cause (antecedent), explanatory
hypothesis (=explanation). An observed relationship between X1 and Y is
spurious because they share a common cause X2
X2 X1
Y
- Conceptual model
o Graphical representation of a set of logically connected hypotheses, full
picture
o Set hypotheses allow to draw the conceptual model
o from a given conceptual model one can reconstruct the original hypotheses
, causality, unit of analysis, fallacies
causality
- 3 conditions to establish causality
o Association: statistical relationship between the variables
Need to be a ‘perfect’ relationship
Often ‘weak’ relationships observed due to measurement error or
multicausality
o Direction of the relationship
Cause consequence
Independent variable influences dependent variable (not other way
around)
Sometimes obvious, for instance: characteristics that are fixed by birth
(birth year, parental background…) or events in the past
But not always, for instance does ethnocentrism influence the contact
with immigrants or does having contact with immigrants influence a
person’s level of ethnocentrism
Note: hypotheses usually define a direction of the relationship
o Nonspuriousness (or absence of spuriousness)
No extraneous variables or antecedents are allowed to explain the
relationship between the variables of interest
So: that is why taking into account the effect of antecedents is crucial
to establish causality in an empirical situation
Note: antecedents are often called control variables, the more control
variables in a model the more likely the relationship is not spurious if
still observed when the control variables are included
Unit of analysis
1. Unit of analysis
- About whom or what statements are made in the research
- Note: unit of observation may deviate from unit of analysis in a research
2. Nested data
- Multilevel data
- Combining data from different units of observation in which individuals’ cases
constitute elements of larger groups (aggregates)
- Sources of information at group level
o National and regional statistics
o Data from previous research (=external aggregation)
o Aggregating individual level data (=internal aggregation)
3. How to define the unit of analysis with multilevel data
- Define unit of analysis per variable defined in the hypothesis
- Unit of analysis to test the hypothesis is the unit of analysis that is the smallest
aggregate (with individual case as the smallest overall unit possible) (case > subgroup
> group)
- Tip: often (but not always) the dependent variable defines what the unit of analysis is
- Examples:
o The gross domestic product of countries has a positive effect on the level of
optimism of the citizens
Gross domestic product national level
, Level of optimism citizens level
So, unit of analysis is citizens
o The average satisfaction with the job within teams increases with the level
balancing of genders within teams
Average satisfaction individual information to measure average so
team level
Balancing of genders within teams
So, within teams because it both gets measured within teams
Logical fallacies
- Drawing conclusions at one level while analyzing findings at another level
1. Ecological fallacy: drawing conclusions at the individual level while analyzing the
group level
o Finding: the higher the level of unemployment in different regions the higher
the % of extreme-right voting
o Interpretation: unemployed people tend to vote more extreme right than
employed people
2. Atomistic fallacy: drawing conclusions on aggregate level from analyses of
individual level data
o Finding: the likelihood of dying from coronary heart disease decreases the
higher the income of that person
o Interpretation (reasoning at the country level): so, a decrease in morality by
coronary heart disease is linked to the wealth of the country (or in layman
language: the richer a country the less likely people die from coronary heart
disease)
Elaboration
- Elaboration (chapter 15 p. 455-462): enhance or ‘elaborating’ our understanding of a
bivariate relationship by introducing a 3th ‘control’ variable in contingency tables (or
cross tabulations). Applies to moderation, meditation and spuriousness.
- Control variable: common cause
- Simpson’s paradox