EXAM STUDY SHEET QUESTIONS WITH
ANSWERS GRADED A+
◍ Response/Target Variable (Y). Answer: This is a variable we're
interested in understanding, modeling or testing
This is a random variable. It varies with changes in the predictor(s)
◍ 2. Predicting/Explanatory (independent) Variables(Xs ~ X1, X2).
Answer: These are variables we think might be useful in predicting or
modeling the response variable
This is a fixed variable. It does not change with the response
◍ Simple Linear Regression. Answer: We have a straight line which
doesn't fit perfectly to the points
The objective is to fit a non-deterministic linear model between the
predicting variable and Y.
In simple linear regression, we have 3 parameters to estimate.
,◍ Multiple Linear Regression. Answer: We can have a plane if we
have two predictions
◍ Polynomial Regression. Answer: We are capturing a nonlinear
relationship
◍ Objectives of Linear Regression. Answer: 1. Prediction: We want
to see how the response variable behaves in different settings
2. Modeling: We are interested in modeling the relationship between
the response variable and the explanatory/predicting variables
3. Testing: We are also interested in testing the hypotheses of
association relationships.
◍ Simple Linear Regression Assumptions. Answer: • Linearity/Mean
Zero Assumption: This means that the expected value of the errors is
zero
• Constant Variance Assumption: This means that the variance of the
error term is equal to sigma_squared is the same across all error terms
• Independence Assumption: This means that the error terms are
independent random variables i.e. deviances (response variables Ys)
are independently drawn from the data generating process -- it cannot
be true that the model under-predicts Y for one particular case tells
you anything or all about what it does for any other case
, • Normal Assumption: The errors are assumed to be normally
distributed.
◍ Linearity Assumption. Answer: A violation of this assumption will
lead to difficulties in estimating 0 and means that your model does not
include a necessary systematic component
◍ Constant Variance Assumption. Answer: This means that the model
cannot be more accurate in some parts and less accurate in other parts
of the model. The variance has to be constant.
A violation of this assumption means that the estimates are not as
efficient as they could be in estimating the true parameters and better
estimates can be calculated also results in poorly calibrated prediction
intervals
◍ Independence Assumption. Answer: It cannot be true that the
model under-predicts Y. One particular case doesn't tell you anything
or all about what it does for any other case
This violation most often occurs in data that are ordered in time (time
series data) where areas that are near each other in time are similar to
each other.
Violation of this assumption can lead to very misleading assessments
of the strength of the regression