BUAL 2650 Final Exam – Lee |
Questions and Answers
Simple Regression - --only 1 predictor variable
-y-hat=b0 + b1*x
- Multiple Regression - --more than 1 predictor variables
-y-hat=b0 + b1*x1 + b2*x2......
- Residual - -the difference between the actual data and the value we
predict for it
=observed-predicted
=y-y-hat
- Interpreting Residuals - --Negative residual: the regression equation
provided an overestimate of the data.
-Positive residual: the regression equation provided an underestimation of
the data.
- Linear regression only works for... - -Linear models
- What do we want to see from a residual plot? - --No pattern
-No plot thickening
-Randomization
- Extrapolation - --venturing into new x territory
-used to estimate values that go beyond a set of given data or observations
-very dangerous
- Dangers of Extrapolation - --assumes there is a linear relationship beyond
the range of the data
-assumes that nothing about the relationship between x and y changes at
extreme values of x
- Interpreting the Intercept of a MRM - -is it meaningful or not? we decide if
it is meaningful by assuming the other coefficients are 0
- Is this multiple regression model any good at all? - -Test hypotheses: HO:
all beta values = 0 vs. HA: at least 1 beta value does not = 0
-then, use a t-test
- Rules for interpreting multiple regression coefficients - --express in terms
of the units of the dependent variable
, -always say "all else being equal"
-always mention the other variables by saying "after (variable #1) and
(variable #2) are accounted for," and interpret the coefficient
- How do we determine if a multiple regression model is significant? - -p-
value (needs to b small) and t-test (needs to be big - this means that at least
one of the predictors accounts for the variation in predicting the dependent
variable.)
- R-squared - --"Goodness of fit"
-a statistical measure of how close the data are to the fitted regression line
(how well observed outcomes are replicated by the model)
- Dangers of R-squared - -
- Interpreting R-Square - -R-square = .80 indicates that the model explains
80% of variability of the response (y) data OR R-square = 0.41 indicates that
41% of the variability of height can be explained by the mode.
- Outliers - -points with y-values far from the regression model; points far
from the body of the data
- Leverage - -A data point can also be unusual if its x-value is far from the
mean of the x-values. Such points are said to have high leverage.
- Influential Point - -We say that a point is influential if omitting it from the
analysis gives a very different slope for the model
- Causality Warning - -no matter how strong the association, no matter how
large the r-squared value, there is no way to conclude that for a regression
alone that one variable caused the other
- Autocorrelation - -When values at time, t, are correlated with values at
time, t-1, we say the values are autocorrelated in the first order. If values are
correlated with values two time periods back, we say second-order
autocorrelation is present, and so on.
- Autoregression and P-values - -large p-values (ex. .870 and .699) means
that the values are not significant
- Why is autocorrelation a problem? - -When data are highly correlated over
time, each data point is similar to those around it, so each data point
provides less additional information than if the points had been independent.
All regression inference is based on independent errors.
Questions and Answers
Simple Regression - --only 1 predictor variable
-y-hat=b0 + b1*x
- Multiple Regression - --more than 1 predictor variables
-y-hat=b0 + b1*x1 + b2*x2......
- Residual - -the difference between the actual data and the value we
predict for it
=observed-predicted
=y-y-hat
- Interpreting Residuals - --Negative residual: the regression equation
provided an overestimate of the data.
-Positive residual: the regression equation provided an underestimation of
the data.
- Linear regression only works for... - -Linear models
- What do we want to see from a residual plot? - --No pattern
-No plot thickening
-Randomization
- Extrapolation - --venturing into new x territory
-used to estimate values that go beyond a set of given data or observations
-very dangerous
- Dangers of Extrapolation - --assumes there is a linear relationship beyond
the range of the data
-assumes that nothing about the relationship between x and y changes at
extreme values of x
- Interpreting the Intercept of a MRM - -is it meaningful or not? we decide if
it is meaningful by assuming the other coefficients are 0
- Is this multiple regression model any good at all? - -Test hypotheses: HO:
all beta values = 0 vs. HA: at least 1 beta value does not = 0
-then, use a t-test
- Rules for interpreting multiple regression coefficients - --express in terms
of the units of the dependent variable
, -always say "all else being equal"
-always mention the other variables by saying "after (variable #1) and
(variable #2) are accounted for," and interpret the coefficient
- How do we determine if a multiple regression model is significant? - -p-
value (needs to b small) and t-test (needs to be big - this means that at least
one of the predictors accounts for the variation in predicting the dependent
variable.)
- R-squared - --"Goodness of fit"
-a statistical measure of how close the data are to the fitted regression line
(how well observed outcomes are replicated by the model)
- Dangers of R-squared - -
- Interpreting R-Square - -R-square = .80 indicates that the model explains
80% of variability of the response (y) data OR R-square = 0.41 indicates that
41% of the variability of height can be explained by the mode.
- Outliers - -points with y-values far from the regression model; points far
from the body of the data
- Leverage - -A data point can also be unusual if its x-value is far from the
mean of the x-values. Such points are said to have high leverage.
- Influential Point - -We say that a point is influential if omitting it from the
analysis gives a very different slope for the model
- Causality Warning - -no matter how strong the association, no matter how
large the r-squared value, there is no way to conclude that for a regression
alone that one variable caused the other
- Autocorrelation - -When values at time, t, are correlated with values at
time, t-1, we say the values are autocorrelated in the first order. If values are
correlated with values two time periods back, we say second-order
autocorrelation is present, and so on.
- Autoregression and P-values - -large p-values (ex. .870 and .699) means
that the values are not significant
- Why is autocorrelation a problem? - -When data are highly correlated over
time, each data point is similar to those around it, so each data point
provides less additional information than if the points had been independent.
All regression inference is based on independent errors.