Regression Estimator Properties - Unbiasedness: This is the property that the expectation of the
estimator is exactly the true parameter. What this means is that Beta_1_hat is an unbiased estimator for
Beta_1
Model Parameter Interpretation - a positive value for Beta_1, then that's consistent with a direct
relationship between the predicting variable X and the response variable Y
Regression Analysis - Regression analysis is a simple way to investigate the relationship between 2
or more variables in a non-deterministic way.
Response/Target Variable (Y) - This is a variable we're interested in understanding, modeling or
testing
This is a random variable. It varies with changes in the predictor(s)
2. Predicting/Explanatory (independent) Variables(Xs ~ X1, X2) - These are variables we think
might be useful in predicting or modeling the response variable
This is a fixed variable. It does not change with the response
Simple Linear Regression - We have a straight line which doesn't fit perfectly to the points
The objective is to fit a non-deterministic linear model between the predicting variable and Y.
In simple linear regression, we have 3 parameters to estimate.
Multiple Linear Regression - We can have a plane if we have two predictions
, Polynomial Regression - We are capturing a nonlinear relationship
Objectives of Linear Regression - 1. Prediction: We want to see how the response variable
behaves in different settings
2. Modeling: We are interested in modeling the relationship between the response variable and the
explanatory/predicting variables
3. Testing: We are also interested in testing the hypotheses of association relationships.
Simple Linear Regression Assumptions - • Linearity/Mean Zero Assumption: This means that the
expected value of the errors is zero
• Constant Variance Assumption: This means that the variance of the error term is equal to
sigma_squared is the same across all error terms
• Independence Assumption: This means that the error terms are independent random variables i.e.
deviances (response variables Ys) are independently drawn from the data generating process -- it cannot
be true that the model under-predicts Y for one particular case tells you anything or all about what it
does for any other case
• Normal Assumption: The errors are assumed to be normally distributed.
Linearity Assumption - A violation of this assumption will lead to difficulties in estimating 0 and
means that your model does not include a necessary systematic component
Constant Variance Assumption - This means that the model cannot be more accurate in some
parts and less accurate in other parts of the model. The variance has to be constant.
A violation of this assumption means that the estimates are not as efficient as they could be in
estimating the true parameters and better estimates can be calculated also results in poorly calibrated
prediction intervals