PASSED ANSWERS!!
Visually, how can you tell their is an association between two categorical
variables? Answer - On a mosaic plot, the segments representing the levels of
y will vary between levels of x. see example on page (11/138) if need be
`Visually, how can you tell if an association exists between a y that is
quantitative and an x that is numerical (such as number ranges for salary, for
example)? Answer - Using side-by-side box plots, an association exists if the
mean/median lines for the box plots differ (13/138 for example).
Visually, how do you determine if their is an association between two
quantitative variables? Answer - looking at a scatterplot, the form of the
scatter should contain some order rather than just a cloud.
Interpretation for B1 in a simple linear regression model Answer - Two
individuals that differ in x by one unit will differ in y by b1 units. if B1 >0, the
person with the larger value of x is expected to have the larger y.
T/F: A simple linear relationship describes the relationship between x values
and the average value of y and cannot be used to predict individual values of y.
Answer - True
What do you have to add to a simple linear regression model in order to use it
to predict individual values of y? Answer - You have to add a disturbance
, (standard error) that represents the typical distance an individual value of y is
from the average value of y.
Under what condition do you have to test the assumptions of a simple linear
regression model Answer - if n<25.
T/F: Multiple regression models typically follow the normal distribution model.
Answer - True
What are the assumptions that need to be checked for a multiple regression
model? Answer - 1. Linearity : There should be no curvature in the residual
plots
2 equal spread : The spread of the points on the diagnostic plot should be
relatively even from left to right (no heteroscedasticity)
3. Normality: There should be no significant departure from a normal
distribution.
*If n < 25, the assumptions can just be checked visually.
What are the two methods for determining if outliers are influential points and
describe each. Answer - 1. Leverage: The leverage is the distance a particular
point is from the "predictor data cloud" (where both axis are x's). If a point is
significantly further away from the data cloud, then it is an influential point.
2. Deleted Studentized Residuals: The residual of a point based on a regression
line that is built without it. It this is large, then the point is pulling the line
toward itself and is therefore influential.