Causal Analysis Techniques Study Summary 2019
SNHT
Significant Hypothesis Testitng; Statistical Hypotheses
- Null hypothesis: means of all groups are the same; M = 0
o Group membership (x) canot explain systematic differences in Y scores
- Alternative hypothesis: there is at least one difference;
M!= M1 = M2; M = M1 != M2; M != M1 != M2
T-test: significance testing for 2 variables (2 means)
Type 1 Error
= the mistake of finding an effect in the sample that does not exist in the population
= falsely rejecting H0
- when we find a value that is so extreme that it is highly unlikely that H0 it is true
- alpha: Proportion of possible samples from the population that are extreme, when
we assume H0 is true (alpha = 0.5 à significant difference in 1/20 despite H0)
Type II Error:
= the mistake of not finding an effect in a sample that exists in the population
= falsely rejecting H1
Inflated Type 1 Error:
- Problem: 5% mistake is allowed every time; so something could accidentially be
found if done many times (by sampling extreme values)
- When we do multiple tests, we can compute probability of Type I Error
1 – (1 – alpha) ^c
à better to use ANOVA or F-test (1 big test that compares all gropus at once)
1. ANOVA – Analysis of Variance
- categorical X
- few variables
- simple relationship
Goal: relating the scores of a variable to the scores of another variable (systematic
differences), make a statement about possible – significant – differences between the mean
score of those groups
Variance Components
- Between-group deviation/ variance (belong to different groups)
o All factors that cause systematic differences between groups
o E.g. company divisions: same manager, office building, colleagues, workflow
- Within-group deviation/ variation
o All other factors that cause differences between group-members
o E.g. company divisions: sex, age, salary, personality, relationships, habits,…
1
, Causal Analysis Techniques Study Summary 2019
Grand mean
*Means = best prediction, if nothing is known about a group
Calculated of Y variables: sum of all scores divided by sample size
Group mean
Sum of all scores of one group devided by group size
Deviation Scores
1. Total Deviation: Deviation of an individual score from the grand mean
𝒀(𝒊𝒋) – 𝑴(𝒚)
à + or – values
Total deviation has 2 components:
- between group deviation: How different is a certain group compared to other
groups?
- within group deviation: How different is a particular member compared to other
members of a group?
2. Within-group deviation: Deviation of an individual score from the group mean
(𝒀(𝒊𝒋) – 𝑴(𝒊)) = 𝜺(𝒊𝒋)
𝜀 (ij) = residual Error
3. Between-group deviation: Deviation of the group mean from the grand mean
(𝑴(𝒊) – 𝑴(𝒚)) = 𝜶 (𝒊)
𝛼(i) = Effect for group i
à Total deviation = (𝒀 𝒊𝒋 − 𝑴 𝒚 ) = (𝒀 𝒊𝒋 − 𝑴 𝒋 ) + (𝑴 𝒋 − 𝑴 𝒚 )
= within group deviation + between group deviation
Sum of Squares
- overall difference between people
- deviation scores only pertain to one individual
- combine all deviation scores into one number (square before summing, otherwise it
sums to 0)
Sums of squares within
= within-group squared deviation summed over the group and then sum all groups
Sum of squares between
= multiply SSbetween for each group member (deviation (per person) = individual score to
group mean to grand mean)
*if people are added to groups, the SS increases (unless they score the mean)
Degrees of freedom:
- adjust sample size to match amount of independent information (last bit can always
be computed)
Df within = N-k Df between = k-1
2
SNHT
Significant Hypothesis Testitng; Statistical Hypotheses
- Null hypothesis: means of all groups are the same; M = 0
o Group membership (x) canot explain systematic differences in Y scores
- Alternative hypothesis: there is at least one difference;
M!= M1 = M2; M = M1 != M2; M != M1 != M2
T-test: significance testing for 2 variables (2 means)
Type 1 Error
= the mistake of finding an effect in the sample that does not exist in the population
= falsely rejecting H0
- when we find a value that is so extreme that it is highly unlikely that H0 it is true
- alpha: Proportion of possible samples from the population that are extreme, when
we assume H0 is true (alpha = 0.5 à significant difference in 1/20 despite H0)
Type II Error:
= the mistake of not finding an effect in a sample that exists in the population
= falsely rejecting H1
Inflated Type 1 Error:
- Problem: 5% mistake is allowed every time; so something could accidentially be
found if done many times (by sampling extreme values)
- When we do multiple tests, we can compute probability of Type I Error
1 – (1 – alpha) ^c
à better to use ANOVA or F-test (1 big test that compares all gropus at once)
1. ANOVA – Analysis of Variance
- categorical X
- few variables
- simple relationship
Goal: relating the scores of a variable to the scores of another variable (systematic
differences), make a statement about possible – significant – differences between the mean
score of those groups
Variance Components
- Between-group deviation/ variance (belong to different groups)
o All factors that cause systematic differences between groups
o E.g. company divisions: same manager, office building, colleagues, workflow
- Within-group deviation/ variation
o All other factors that cause differences between group-members
o E.g. company divisions: sex, age, salary, personality, relationships, habits,…
1
, Causal Analysis Techniques Study Summary 2019
Grand mean
*Means = best prediction, if nothing is known about a group
Calculated of Y variables: sum of all scores divided by sample size
Group mean
Sum of all scores of one group devided by group size
Deviation Scores
1. Total Deviation: Deviation of an individual score from the grand mean
𝒀(𝒊𝒋) – 𝑴(𝒚)
à + or – values
Total deviation has 2 components:
- between group deviation: How different is a certain group compared to other
groups?
- within group deviation: How different is a particular member compared to other
members of a group?
2. Within-group deviation: Deviation of an individual score from the group mean
(𝒀(𝒊𝒋) – 𝑴(𝒊)) = 𝜺(𝒊𝒋)
𝜀 (ij) = residual Error
3. Between-group deviation: Deviation of the group mean from the grand mean
(𝑴(𝒊) – 𝑴(𝒚)) = 𝜶 (𝒊)
𝛼(i) = Effect for group i
à Total deviation = (𝒀 𝒊𝒋 − 𝑴 𝒚 ) = (𝒀 𝒊𝒋 − 𝑴 𝒋 ) + (𝑴 𝒋 − 𝑴 𝒚 )
= within group deviation + between group deviation
Sum of Squares
- overall difference between people
- deviation scores only pertain to one individual
- combine all deviation scores into one number (square before summing, otherwise it
sums to 0)
Sums of squares within
= within-group squared deviation summed over the group and then sum all groups
Sum of squares between
= multiply SSbetween for each group member (deviation (per person) = individual score to
group mean to grand mean)
*if people are added to groups, the SS increases (unless they score the mean)
Degrees of freedom:
- adjust sample size to match amount of independent information (last bit can always
be computed)
Df within = N-k Df between = k-1
2