Week 1
Correlation: association
Causality: effect
1. Covariance: variables have an association
2. Directionality: cause precedes effect
3. Internal validity: eliminate alternative explanations
Scatterplots:
- Direction: positive/negative
- Strength: more points = stronger relation
- Shape: linear/nonlinear, homogeneous/heterogeneous
- Outliers
Covariance
= used to measure degree to which 2 variables vary together.
Formula:
→ provides info on strength and direction of association.
Disadvantage: it’s dependent on the unit of measurement of variables
Solution: standardize by dividing the covariance by standard deviations.
Pearson r
= a standardized measure, describes linear relationship between 2 quantitative
variables, between -1 and +1
1. Calculate z score for each number individually
2. zX * zY for each participant separately
3. Add all those numbers
, 4. Divide by N-1
Formula:
or
Beware of:
- non-linear relationships
- outliers
- heterogeneous subgroups
- restriction of range
Spearman’s rho (ρ))
= describes relation between 2 ordinal variables/ranked scores.
Scores not ranked yet? Convert raw scores into ranks. Then use Pearson correlation
to calculate rs
rs = r on ranked data
Mean:
Standard dev:
It’s an alternative to Pearson r in case of outliers/weak non-linearity.
Point-biserial correlation (rpb)
One variable is dichotomous and quantative.
Use pearson r formula to calculate rpb
rpb = r
Relationship between rpb and tindependent
Phi coefficient (φ) )
= describes relationship between 2 dichotomous variables.
Use pearson r formula to calculate φ. φ = r
OR use the formula:
, Hypotheses for r:
t test for significance of r:
r can be r, rs, rpb, φ
Hypotheses for testing difference between 2 independent r s
z test:
Compare z for two-sided test with α = 0.05
The statistical significance depends on N, r, and α
Result:
- Weak correlations in large samples can become significant.
- Strong correlations in small samples might not significant.
Conclusion: Testing only for significance is too limited.
Measures of effect size:
1) reffect
Can stand for r, rs, rpb, and φ.
Disadvantage: Value of correlation hard to interpret:
r = .60 does NOT mean relationship twice as large as r = .30.
Solution: Square r.
2) r2 or Coefficient of Determination (COD) or Proportion of Variance Accounted For
(VAF)
Advantage: Possible to compare r 2 ’s.
Disadvantages:
- still hard to interpret.
- “determination” erroneously implies causality.
- r2 gives no information about direction of relationship.
- small values of r give even smaller values of r2.
Correlation: association
Causality: effect
1. Covariance: variables have an association
2. Directionality: cause precedes effect
3. Internal validity: eliminate alternative explanations
Scatterplots:
- Direction: positive/negative
- Strength: more points = stronger relation
- Shape: linear/nonlinear, homogeneous/heterogeneous
- Outliers
Covariance
= used to measure degree to which 2 variables vary together.
Formula:
→ provides info on strength and direction of association.
Disadvantage: it’s dependent on the unit of measurement of variables
Solution: standardize by dividing the covariance by standard deviations.
Pearson r
= a standardized measure, describes linear relationship between 2 quantitative
variables, between -1 and +1
1. Calculate z score for each number individually
2. zX * zY for each participant separately
3. Add all those numbers
, 4. Divide by N-1
Formula:
or
Beware of:
- non-linear relationships
- outliers
- heterogeneous subgroups
- restriction of range
Spearman’s rho (ρ))
= describes relation between 2 ordinal variables/ranked scores.
Scores not ranked yet? Convert raw scores into ranks. Then use Pearson correlation
to calculate rs
rs = r on ranked data
Mean:
Standard dev:
It’s an alternative to Pearson r in case of outliers/weak non-linearity.
Point-biserial correlation (rpb)
One variable is dichotomous and quantative.
Use pearson r formula to calculate rpb
rpb = r
Relationship between rpb and tindependent
Phi coefficient (φ) )
= describes relationship between 2 dichotomous variables.
Use pearson r formula to calculate φ. φ = r
OR use the formula:
, Hypotheses for r:
t test for significance of r:
r can be r, rs, rpb, φ
Hypotheses for testing difference between 2 independent r s
z test:
Compare z for two-sided test with α = 0.05
The statistical significance depends on N, r, and α
Result:
- Weak correlations in large samples can become significant.
- Strong correlations in small samples might not significant.
Conclusion: Testing only for significance is too limited.
Measures of effect size:
1) reffect
Can stand for r, rs, rpb, and φ.
Disadvantage: Value of correlation hard to interpret:
r = .60 does NOT mean relationship twice as large as r = .30.
Solution: Square r.
2) r2 or Coefficient of Determination (COD) or Proportion of Variance Accounted For
(VAF)
Advantage: Possible to compare r 2 ’s.
Disadvantages:
- still hard to interpret.
- “determination” erroneously implies causality.
- r2 gives no information about direction of relationship.
- small values of r give even smaller values of r2.