Correlation and Regression
Covariance Covariance : measure the linear relationship between 2 random variables
∑
,
1
Correlation coefficient Correlation coefficient : strength of the linear relationship
,
1 1
Scatter plot Scatter plot : collection of point on the graph , each represents the value of 2 variables (X and Y)
‐ Upward scatter plot : positive correlation
‐ Downward scatter plot : negative correlation
Limitation to correlation analysis 1. Outliers : extreme values for sample observations
→ sta s cal evidence that significant rela onship exist when there is none, or
→ no rela onship when there is
2. Spurious Correlation : may appear to have a relationship when there is none
3. Correlation only measure linear relationship, but not non‐linear relationship
Test the correlation between 2 Methodology : use t‐test to test whether the correlation between 2 variables = 0
variables : 0; : 0
2
1
2
Dependent / Independent Dependent variables : variable whose variation is explained by independent variables
variables Independent variables : variable is used to explain the variation of dependent variables
Assumptions of linear regression 1. Linear regression exists between dependent and independent variables
2. Independent variable : uncorrelated with residuals
3. Expected value of residual term E(ε) = 0
4. Variance of the residual term is constant for all observations
5. Residual term is independently distributed (residual for observation A is not correlated with residual for observation B)
6. Residual term is normally distributed
Linear regression model
In which :
Linear equation for regression line
In which :
Confidence interval for regression Confidence interval of regression slope coefficient :
slope coefficient (Range of b1)
or
In which:
2 , 2
Test hypothesis that slope From the confidence interval for slope coefficient → use t‐test to test the hypothesis that slope coefficient = hypothesised value
coefficient = hypothesized value
, Predicted value (Y) / Predicted value : value of dependent variable based on
Confidence intervals for predicted ‐ estimated regression coefficient; and
value (Y) ‐ predicted independent variable.
Confidence intervals for predicted value (Y)
→
In which :
2 , 2
1
1
1
Analysis of variance (ANOVA) Analysis of variance (ANOVA) : analyse total variabiility of dependent variable
1. Total sum of squares (SST) : total variation in the dependent variable
2. Regression sum of squares (RSS) : variation in the dependent variable that is explained by independent variable
3. Regression of squared errors (SSE) : unexplained variation in depedent variable
SST = RSS + SSE
Standard error of estimates / Standard error of estimates (SEE) : degree of variability of the actual Y relative to estimated Y
Coefficient of Determination ‐ Low SEE → strong rela onship (low variability)
2
Coefficient of Determination : % of total variation in the dependent variable is explained by the independent variable
F‐statistic F‐statistic : how well a set of independent variables explains the variation in dependent variable
⁄
⁄ 1
(*) F‐statistic is always a 1‐tailed test
F‐statistic with 1 independent variable
∶ 0 ∶ 0
1
1 2
Note :
For simple linear regression with 1 independent variable :
Limitations of regression analysis 1. Parameter instability : Linear relationships can change overtime → es mated equa on based on data from a specific me period may not be relevant for forecasts / predic on in
another time period
2. Usefulness in investment analysis is limited if other market participants aware of and act on this evidence
3. Assumptions underlying regression analysis do not hold → interpreta on and tests of hypothesis may not be valid
‐ Heteroskedastic : nonn‐constant variance of error terms
‐ Autocorrelation : error terms are not independent