How does the skew of a distribution affect its median relative to its mean? - Answers Left
skewed -> mean < median
No skew -> mean = median
Right skew -> mean > median
Define correlation coefficient - Answers Correlation coefficient captures strength of linear
relationships
Define total deviation, explained deviation, and unexplained deviation for OLS linear regression
problem. - Answers Total deviation: difference between observation and mean
Explained deviation: difference between predicted value and mean
Unexplained deviation/ error/residual: difference between predicted value and observed value
Define total sum of squares (SST), sum of squared errors (SSE), and sum of squares regression
(SSR) and how they relate to one another. - Answers SST = (observation - average)^2
SSE = (observation - prediction)^2
SSR = (average - prediction)^2
SST = SSE + SSR
Define R^2 and adjusted R^2 - Answers R^2 = 1- (SSE)/(SST) = SSR/SST
R^2 = explained deviation/ total deviation, a measure of overall strength between dependent and
independent variables
Adjusted R^2 adds a penalty based on the number of independent variables (p) and
observations (n)
Adjusted R^2 = 1 - SSE*(n-1)/(SST * (n-p-1))
What values can R^2 occupy? How does one interpret the value of R^2? - Answers 0 <= R^2 <= 1
R^2 = 1: X accounts for all Y variation
R^2 = 0: X accounts for none of the Y variation
Explain T-value, P-value and F-statistic - Answers Null hypothesis (H_0): coefficient is 0
T-value = coefficient estimate divided by its standard error
, P-value: null hypothesis that T-value = (what is the probability that null hypothesis true?) -->
reject if low probability
F-statistic: probability that coefficient is 0
How do you compute the F-statistic - Answers F-statistic = {(SSR/P)}/{(SSE)/(N-P-1)}
How does R^2 change as a factor of number of variables? - Answers R^2 will either increase or
stay the same as you add more variables.
What are the 3 main assumptions of linear regression? - Answers 1. Linear assumption: Value
of Y at each value of X approximates a straight line
2. Assumption about errors: The error terms are independently and identically distributed
normal random variables, each with mean 0 and constant variance (homoscedasticity)
3. Assumptions about predictors: In multiple regression, predictor variables are assumed to be
linearly independent of one another.
What are the 6 most common problems in fitting linear regression? - Answers 1. Non-linearity of
response predictor relationship
2. Correlation of error terms
3. Non-constant variance of error terms
4. Outliers
5. High-leverage points
6. Collinearity
How can you identify if a relationship is nonlinear? If it is, how can you address it? - Answers
Identify by plotting the two variables against one another (should see a line) or plotting residuals
vs. fitted values (want to see no patterns)
If the relationship is nonlinear, you can transform the variables (using a higher order term or log),
look for outliers, see if you are missing a variable, or check for systematic bias.
How can one detect autocorrelations of error terms? What are the effects of autocorrelation of
error terms? - Answers Detection: Durbin-Watson test
Effects of autocorrelation of error terms:
§ Estimated standard errors will underestimate true standard errors
§ Confidence and prediction intervals will be narrower than they should be and p-values will be
lower than they should be