Questions and CORRECT Answers
How does the skew of a distribution affect its median relative to its mean? - CORRECT
ANSWER - Left skewed -> mean < median
No skew -> mean = median
Right skew -> mean > median
Define correlation coefficient - CORRECT ANSWER - Correlation coefficient captures
strength of linear relationships
Define total deviation, explained deviation, and unexplained deviation for OLS linear regression
problem. - CORRECT ANSWER - Total deviation: difference between observation and
mean
Explained deviation: difference between predicted value and mean
Unexplained deviation/ error/residual: difference between predicted value and observed value
Define total sum of squares (SST), sum of squared errors (SSE), and sum of squares regression
(SSR) and how they relate to one another. - CORRECT ANSWER - SST = (observation -
average)^2
SSE = (observation - prediction)^2
SSR = (average - prediction)^2
SST = SSE + SSR
Define R^2 and adjusted R^2 - CORRECT ANSWER - R^2 = 1- (SSE)/(SST) = SSR/SST
R^2 = explained deviation/ total deviation, a measure of overall strength between dependent and
independent variables
Adjusted R^2 adds a penalty based on the number of independent variables (p) and observations
(n)
Adjusted R^2 = 1 - SSE*(n-1)/(SST * (n-p-1))
, What values can R^2 occupy? How does one interpret the value of R^2? - CORRECT
ANSWER - 0 <= R^2 <= 1
R^2 = 1: X accounts for all Y variation
R^2 = 0: X accounts for none of the Y variation
Explain T-value, P-value and F-statistic - CORRECT ANSWER - Null hypothesis (H_0):
coefficient is 0
T-value = coefficient estimate divided by its standard error
P-value: null hypothesis that T-value = (what is the probability that null hypothesis true?) -->
reject if low probability
F-statistic: probability that coefficient is 0
How do you compute the F-statistic - CORRECT ANSWER - F-statistic =
{(SSR/P)}/{(SSE)/(N-P-1)}
How does R^2 change as a factor of number of variables? - CORRECT ANSWER - R^2
will either increase or stay the same as you add more variables.
What are the 3 main assumptions of linear regression? - CORRECT ANSWER - 1. Linear
assumption: Value of Y at each value of X approximates a straight line
2. Assumption about errors: The error terms are independently and identically distributed normal
random variables, each with mean 0 and constant variance (homoscedasticity)
3. Assumptions about predictors: In multiple regression, predictor variables are assumed to be
linearly independent of one another.
What are the 6 most common problems in fitting linear regression? - CORRECT
ANSWER - 1. Non-linearity of response predictor relationship
2. Correlation of error terms
3. Non-constant variance of error terms
4. Outliers