Hoorcollege 1
Regression analysis
- Technique to understand and quantitatively summarize relationships among variables
Is about relationships between variables
- Y = dependent (variable to be explained)
- X = independent (explanatory variable)
- Regress Y on X (positive/negative relationship, size relationship, is it statistically significant (likely to
not be purely because of randomness))
- Causal effect is often hypothesized (verwacht), but it’s not necessarily true(positive and negative
effects)
Correlation
- Correlation coefficient
- Perfectly positive correlated: +1
- No correlation: 0
- Perfectly negative correlated: -1
Correlation coefficient (rho/r)
- Degree or strength of a (linear) association between two variables
- Standardized covariation between two variables (X&Y)
- Standardized because the correlation always ranges between -1 and +1
- Use formula for ‘r’ for this
- Standardization with respect to scale (variation in X and variation in Y)
Variance (variantie): degree of difference along the scores (hoeveel ze verschillen van het
gemiddelde)
- Always positive number
Product of deviances
- Covariance (X,Y): sum of product of deviances in X and Y for all data points
- Variance (X): sum of squared deviances in X
- Variance (Y): sum of squared deviances in Y
Regression
- Line in cluster (scatterplot): regression tells how well that line fits the cluster
- Correlation cannot tell difference between slopes in the lines (regression can)
- Correlation only tells about the LINEAR relationship
Significance testing
1) Ensure that assumptions are met
2) Formulate hypotheses
3) Determine the critical area from the appropriate sampling distribution
4) Calculate the test statistic
5) Make decision
6) State conclusions
, Hoorcollege 2
Hypotheses testing
Regression equation: Y = a + bX + e
a = snijpunt met de y-as
b = slope
- Minimizes squared distances between point and line
Regression analysis
- Technique to understand and quantitatively summarize relationships among variables
Is about relationships between variables
- Y = dependent (variable to be explained)
- X = independent (explanatory variable)
- Regress Y on X (positive/negative relationship, size relationship, is it statistically significant (likely to
not be purely because of randomness))
- Causal effect is often hypothesized (verwacht), but it’s not necessarily true(positive and negative
effects)
Correlation
- Correlation coefficient
- Perfectly positive correlated: +1
- No correlation: 0
- Perfectly negative correlated: -1
Correlation coefficient (rho/r)
- Degree or strength of a (linear) association between two variables
- Standardized covariation between two variables (X&Y)
- Standardized because the correlation always ranges between -1 and +1
- Use formula for ‘r’ for this
- Standardization with respect to scale (variation in X and variation in Y)
Variance (variantie): degree of difference along the scores (hoeveel ze verschillen van het
gemiddelde)
- Always positive number
Product of deviances
- Covariance (X,Y): sum of product of deviances in X and Y for all data points
- Variance (X): sum of squared deviances in X
- Variance (Y): sum of squared deviances in Y
Regression
- Line in cluster (scatterplot): regression tells how well that line fits the cluster
- Correlation cannot tell difference between slopes in the lines (regression can)
- Correlation only tells about the LINEAR relationship
Significance testing
1) Ensure that assumptions are met
2) Formulate hypotheses
3) Determine the critical area from the appropriate sampling distribution
4) Calculate the test statistic
5) Make decision
6) State conclusions
, Hoorcollege 2
Hypotheses testing
Regression equation: Y = a + bX + e
a = snijpunt met de y-as
b = slope
- Minimizes squared distances between point and line