Samenvatting Quantitative
Lecture 1 – Introduction, refreshing, correlation and
simple regression analysis
Relation is possible in 3 ways:
- Positively related: + -> +
- Negatively related: + -> -
- Not related at all
Modelling relationships:
- Outcomei = (model) + errori
- X = predictor variable
- Outcomei = (b Xi) + errori
- If one X in the model -> b = Pearson product – moment correlation
coefficient, denoted by r
Covariance (p. 264)
- A measure of the ‘average’ relationship between two variables.
- The average cross-product deviation
- The variance represents: de gemiddelde waarde die de data afwijkt van de
mean (s2)
- When one variable deviates from its mean, we would expect the other
variable to deviate from its mean in a similar way
- Cross-product deviations: a measure of the total relationship between two
variables
- The average sum is known as the covariance
- Positive covariance: as one variable deviates from the mean, the other
variable deviates in the same direction
- Negative covariance: as one variable deviates from the mean (bijv.
Toeneemt), the other deviates from the mean in the opposite direction (->
afneemt)
- The covariance depends upon the scales of measurement used: it is not a
standardized measure
- Hierdoor kunnen we niet zeggen of een covariance groot of klein is.
Standardization (p. 266)
- The process of converting a variable into a standard unit of measurement
- Typically used: standard deviation unit
- It allows us to compare data when different units of measurement have
been used (vb. Kg en inch)
- The standard covariance = Pearson’s correlation coefficient = r
- Value between -1 and +1
- -1: perfecte negatieve correlatie
- +1: perfecte positieve correlatie
- Commonly used measure of an effect size (.1 = small, .3 = medium, .5 =
large)
- This is all about bivariate correlation (between 2 variables)
Significance of the correlation coefficient (p. 268)
- Test the hypothesis that the correlation is different from 0
- Testing it using z-score, but also with t-scores
1
, Betrouwbaarheidsintervallen voor r
- They tell us something about the likely value in the population
- Lower boundary: x – (1,96 x SE)
- Upper boundary: x + (1,96 x SE)
- Kan niet met spss?
You can’t say anything about causality (oorzaak-gevolg) (p. 270), because
of:
- The third variable problem: causality between two variables cannot be
assumed. There may be other measured or unmeasured variables affecting
the results
- Direction of causality: correlatie zegt niets over welke variabele invloed
heeft op de ander. Geen indicatie van de richting.
Sources of bias (p. 271)
- Linearity: if the relationship between variables is not linear, then this
model is invalid -> outcome needs to be interval scale
- Normality: we care about this only if we want confidence intervals or
significance tests and if the sample size is small -> p-p plot
Pearson’s r in SPSS (p. 275)
- R2 = coefficient of determination
- A measure of the amount of variability in one variable that is shared by the
other
Partial correlation (p. 281)
- A measure of the relationship between two variables while controlling the
effect of one or more additional variables on both
- With this we can address the third variable problem to some degree
Semi-partial correlation (p. 285)
- A measure of the relationship between two variables while controlling the
effect that one or more additional variables has on one of these variables
- The effect of the third variable on only one variable
Regression (p. 294)
- Outcomei = yi = (b0+b1 Xi) + Ei
- This equation keeps the fundamental idea that an outcome for a person
can be predicted from a model
- The form of the equation is a straight line ( y = ax + b )
- b0 = the intercept of the line
- b1 = the slope
- b0 + b1 Xi = predicted value
Estimating the model (p. 298)
- residual: The difference between what the model predicts and the
observed data
- Residual Sum of Squares = SSR: a measure of the variability that cannot
be explained by the model fitted to the data/ a gauge of how well a
particular line fits the data
- Ordinary Least Squares = OLS: a method of regression in which the
parameters of the model are estimated using the method of least squares
2
Lecture 1 – Introduction, refreshing, correlation and
simple regression analysis
Relation is possible in 3 ways:
- Positively related: + -> +
- Negatively related: + -> -
- Not related at all
Modelling relationships:
- Outcomei = (model) + errori
- X = predictor variable
- Outcomei = (b Xi) + errori
- If one X in the model -> b = Pearson product – moment correlation
coefficient, denoted by r
Covariance (p. 264)
- A measure of the ‘average’ relationship between two variables.
- The average cross-product deviation
- The variance represents: de gemiddelde waarde die de data afwijkt van de
mean (s2)
- When one variable deviates from its mean, we would expect the other
variable to deviate from its mean in a similar way
- Cross-product deviations: a measure of the total relationship between two
variables
- The average sum is known as the covariance
- Positive covariance: as one variable deviates from the mean, the other
variable deviates in the same direction
- Negative covariance: as one variable deviates from the mean (bijv.
Toeneemt), the other deviates from the mean in the opposite direction (->
afneemt)
- The covariance depends upon the scales of measurement used: it is not a
standardized measure
- Hierdoor kunnen we niet zeggen of een covariance groot of klein is.
Standardization (p. 266)
- The process of converting a variable into a standard unit of measurement
- Typically used: standard deviation unit
- It allows us to compare data when different units of measurement have
been used (vb. Kg en inch)
- The standard covariance = Pearson’s correlation coefficient = r
- Value between -1 and +1
- -1: perfecte negatieve correlatie
- +1: perfecte positieve correlatie
- Commonly used measure of an effect size (.1 = small, .3 = medium, .5 =
large)
- This is all about bivariate correlation (between 2 variables)
Significance of the correlation coefficient (p. 268)
- Test the hypothesis that the correlation is different from 0
- Testing it using z-score, but also with t-scores
1
, Betrouwbaarheidsintervallen voor r
- They tell us something about the likely value in the population
- Lower boundary: x – (1,96 x SE)
- Upper boundary: x + (1,96 x SE)
- Kan niet met spss?
You can’t say anything about causality (oorzaak-gevolg) (p. 270), because
of:
- The third variable problem: causality between two variables cannot be
assumed. There may be other measured or unmeasured variables affecting
the results
- Direction of causality: correlatie zegt niets over welke variabele invloed
heeft op de ander. Geen indicatie van de richting.
Sources of bias (p. 271)
- Linearity: if the relationship between variables is not linear, then this
model is invalid -> outcome needs to be interval scale
- Normality: we care about this only if we want confidence intervals or
significance tests and if the sample size is small -> p-p plot
Pearson’s r in SPSS (p. 275)
- R2 = coefficient of determination
- A measure of the amount of variability in one variable that is shared by the
other
Partial correlation (p. 281)
- A measure of the relationship between two variables while controlling the
effect of one or more additional variables on both
- With this we can address the third variable problem to some degree
Semi-partial correlation (p. 285)
- A measure of the relationship between two variables while controlling the
effect that one or more additional variables has on one of these variables
- The effect of the third variable on only one variable
Regression (p. 294)
- Outcomei = yi = (b0+b1 Xi) + Ei
- This equation keeps the fundamental idea that an outcome for a person
can be predicted from a model
- The form of the equation is a straight line ( y = ax + b )
- b0 = the intercept of the line
- b1 = the slope
- b0 + b1 Xi = predicted value
Estimating the model (p. 298)
- residual: The difference between what the model predicts and the
observed data
- Residual Sum of Squares = SSR: a measure of the variability that cannot
be explained by the model fitted to the data/ a gauge of how well a
particular line fits the data
- Ordinary Least Squares = OLS: a method of regression in which the
parameters of the model are estimated using the method of least squares
2