Lecture 1 – Mediation
Mediation: testing theoretical mechanisms at micro-level (for example: team level)
Moderation: allows to test for changes in two variables as the level of moderator changes, can
flip the sign from positive to negative and vice versa
Conditional process models: models that combine mediation and moderation
What is a mediating variable?
• A mediating variable is one that will change as a result of the influence of the IV (X),
and then will, in turn, cause a change in the DV (Y)
• Therefore, a variable like gender would not be a good candidate to be a mediating
variable
• How about team conflict? (one of the best mediators)
• The mediator has to change as a consequence of change in your IV (X). Hence, some
variables (e.g. personality traits, gender), may/are not good candidates for mediation
variables
So what is the goal of mediation?
• To examine the magnitude and valence of the mechanisms underlying an explanatory
variable (IV) and an outcome variable (DV)
• Provides you with a comparative assessment of the different mechanisms influencing
the outcome variable (DV)
• Basically, it answers the questions “how” does our IV impact your DV?
What is the difference between a theoretical mechanism and a mediator?
• Theoretical mechanism: the argument that connects your variables to each other in
theory and every theoretical mechanism has the potential to become a mediator
(unmeasured mediators)
• Mediator: is a causal argument (Hayes, 2018 argues that you can still run these models
without being able to make 100% causal claims)
• Minimum of three variables X, med
Main assumption: linear relationships between variables (straight line/red line)
Mediator formula includes E = error term → difference between the linear
regression line and the actual data point (black line)
i = intercept
a = slope → one unit change in X, is going to yield 2 unit change in M
The same logic applies to Y formula with the c’X (direct) and bM
(mediator)
,What are the different effect?
• Direct effect = c’ → the effect of your X variable on your Y variables which is not
mediated
• Indirect effect = a*b → the product of your coefficient of your first product and
multiply a by b which is your second coefficient of M, effect of X on Y mediated
through M (indirect effect is also known as the mediated effect)
• Total effect = direct + indirect effect (c’ + a*b)
X = power hierarchy; Y = Team performance; M = Team conflict
Which one is a better hypothesis and why?
• The second one is best, because the second one specified each leg of the mediation.
The first hypothesis did not specify each leg. (logic: if it is not specified it could be
that either a or b is negative, you don’t know which one which is problematic for your
conceptual understanding)
• Most papers already avoid/solve this problem by hypothesizing each leg beforehand
(so in this case, this would mean 3 hypotheses and the final hypothesis is mediation)
,What is missing here?
You need to test for significance, -.33 did not tell you whether the indirect effect is significant
or not.
Logic behind significance testing
Sampling distribution: if we repeatedly sample and the 0 is
not included in 95% than it is statistically significant
Is the indirect effect statistically significant?
• Baron and Kenny (1986) suggested that one could use the Sobel formula to calculate
whether the size of the indirect effect was sufficiently strong to be considered
“statistically significant”.
• Note that the Sobel’s formula is based on multiplying the unstandardized regression
coefficients and standard errors of the a and b pathways.
Testing the indirect effect
Problem
• We are testing the significance of a*b
• To use the Sobel test we need to assume that a*b is normally distributed (and CIs are
symmetric)
• Even if we assume that a and b are each normally distributed, their product will not be
normal
Solution
• We need methods of testing a*b that do not assume normality!
• Bootstrapping → simulation (allow us to simulate what the estimate of sample
distribution is)
Note: when referring to the distributions of a and b, we are talking about coefficients,
not variables
Hypothesis Testing with CIs
• When testing the significance of a*b with bootstrapping etc. we use a CI (confidence
interval) to test our null hypothesis.
• H0: a*b = 0
• If a*b is significant we say there is a less than 5% chance that a*b = 0 in the
population
• A 95% CI provides the same information
• If 0 is not within the 95% CI: In 95% of samples of size n a*b ≠ 0. Significant
mediation effect.
• If 0 is within the 95% CI: : In less than 95% of samples of size n a*b ≠ 0. Non-
significant mediation effect.
, Bootstrapping
• Steps for bootstrapping
1. Draw a sample from the data of size n with replacement
2. Fit your model(s) to this data (e.g., estimate both a and b in two regressions)
3. Save the parameter estimates from Step 2
4. Repeat Steps 1-3 1000s of times
5. The parameter estimates from Step 2 form a distribution for each parameter estimate
6. The 2.5th and 97.5th percentiles of the distribution form the 95% CI
! Bootstrap can pick same teams, because it puts all samples back every time (simple sample).
This is not a problem, because teams are interchangeable. (One team represents all the teams
that are similar to that team, so it does not matter if you pick the same ‘type’ of team twice)
! based on the sample the simulation will create its own equation → normally distribution
does not work very well (model becomes bit asymmetric)
- Plug-in SPSS of Andrew & Hayes to calculate bootstrap
Mediation: testing theoretical mechanisms at micro-level (for example: team level)
Moderation: allows to test for changes in two variables as the level of moderator changes, can
flip the sign from positive to negative and vice versa
Conditional process models: models that combine mediation and moderation
What is a mediating variable?
• A mediating variable is one that will change as a result of the influence of the IV (X),
and then will, in turn, cause a change in the DV (Y)
• Therefore, a variable like gender would not be a good candidate to be a mediating
variable
• How about team conflict? (one of the best mediators)
• The mediator has to change as a consequence of change in your IV (X). Hence, some
variables (e.g. personality traits, gender), may/are not good candidates for mediation
variables
So what is the goal of mediation?
• To examine the magnitude and valence of the mechanisms underlying an explanatory
variable (IV) and an outcome variable (DV)
• Provides you with a comparative assessment of the different mechanisms influencing
the outcome variable (DV)
• Basically, it answers the questions “how” does our IV impact your DV?
What is the difference between a theoretical mechanism and a mediator?
• Theoretical mechanism: the argument that connects your variables to each other in
theory and every theoretical mechanism has the potential to become a mediator
(unmeasured mediators)
• Mediator: is a causal argument (Hayes, 2018 argues that you can still run these models
without being able to make 100% causal claims)
• Minimum of three variables X, med
Main assumption: linear relationships between variables (straight line/red line)
Mediator formula includes E = error term → difference between the linear
regression line and the actual data point (black line)
i = intercept
a = slope → one unit change in X, is going to yield 2 unit change in M
The same logic applies to Y formula with the c’X (direct) and bM
(mediator)
,What are the different effect?
• Direct effect = c’ → the effect of your X variable on your Y variables which is not
mediated
• Indirect effect = a*b → the product of your coefficient of your first product and
multiply a by b which is your second coefficient of M, effect of X on Y mediated
through M (indirect effect is also known as the mediated effect)
• Total effect = direct + indirect effect (c’ + a*b)
X = power hierarchy; Y = Team performance; M = Team conflict
Which one is a better hypothesis and why?
• The second one is best, because the second one specified each leg of the mediation.
The first hypothesis did not specify each leg. (logic: if it is not specified it could be
that either a or b is negative, you don’t know which one which is problematic for your
conceptual understanding)
• Most papers already avoid/solve this problem by hypothesizing each leg beforehand
(so in this case, this would mean 3 hypotheses and the final hypothesis is mediation)
,What is missing here?
You need to test for significance, -.33 did not tell you whether the indirect effect is significant
or not.
Logic behind significance testing
Sampling distribution: if we repeatedly sample and the 0 is
not included in 95% than it is statistically significant
Is the indirect effect statistically significant?
• Baron and Kenny (1986) suggested that one could use the Sobel formula to calculate
whether the size of the indirect effect was sufficiently strong to be considered
“statistically significant”.
• Note that the Sobel’s formula is based on multiplying the unstandardized regression
coefficients and standard errors of the a and b pathways.
Testing the indirect effect
Problem
• We are testing the significance of a*b
• To use the Sobel test we need to assume that a*b is normally distributed (and CIs are
symmetric)
• Even if we assume that a and b are each normally distributed, their product will not be
normal
Solution
• We need methods of testing a*b that do not assume normality!
• Bootstrapping → simulation (allow us to simulate what the estimate of sample
distribution is)
Note: when referring to the distributions of a and b, we are talking about coefficients,
not variables
Hypothesis Testing with CIs
• When testing the significance of a*b with bootstrapping etc. we use a CI (confidence
interval) to test our null hypothesis.
• H0: a*b = 0
• If a*b is significant we say there is a less than 5% chance that a*b = 0 in the
population
• A 95% CI provides the same information
• If 0 is not within the 95% CI: In 95% of samples of size n a*b ≠ 0. Significant
mediation effect.
• If 0 is within the 95% CI: : In less than 95% of samples of size n a*b ≠ 0. Non-
significant mediation effect.
, Bootstrapping
• Steps for bootstrapping
1. Draw a sample from the data of size n with replacement
2. Fit your model(s) to this data (e.g., estimate both a and b in two regressions)
3. Save the parameter estimates from Step 2
4. Repeat Steps 1-3 1000s of times
5. The parameter estimates from Step 2 form a distribution for each parameter estimate
6. The 2.5th and 97.5th percentiles of the distribution form the 95% CI
! Bootstrap can pick same teams, because it puts all samples back every time (simple sample).
This is not a problem, because teams are interchangeable. (One team represents all the teams
that are similar to that team, so it does not matter if you pick the same ‘type’ of team twice)
! based on the sample the simulation will create its own equation → normally distribution
does not work very well (model becomes bit asymmetric)
- Plug-in SPSS of Andrew & Hayes to calculate bootstrap