ISYE 6501 - MIDTERM 2 EXAM| QUESTION AND CORRECT ANSWERS
| VERIFIED ANSWERS GRADED A+ | LATEST EXAM
When might overfitting occur
When the # of factors is close to or larger than the # of data points causing
the model to potentially fit too closely to random effects
Why are simple models better than complex ones
Less data is required; less chance of insignificant factors and easier to
interpret
What is forward selection
We select the best new factor and see if it's good enough (r^2, aic, or p-
value) add it to our model and fit the model with the current set of factors.
Then at the end we remove factors that are lower than a certain threshold
What is backward elimination
What is backward elimination
,We start with all factors and find the worst on a supplied threshold (p =
0.15). If it is worse we remove it and start the process over. We do that until
we have the number of factors that we want and then we move the factors
lower than a second threshold (p = .05) and fit the model with all set of
factors
What is stepwise regression
It is a combination of forward selection and backward elimination. We can
either start with all factors or no factors and at each step we remove or add
a factor. As we go through the procedure after adding each new factor and
at the end we eliminate right away factors that no longer appear.
What type of algorithms are stepwise selection?
Greedy algorithms - at each step they take one thing that looks best
What is lasso
A variable selection method where the coefficients are determined by both
minimizing the squared error and the sum of their absolute value not being
over a certain threshold t
How do you choose t in lasso
Use the lasso approach with different values of t and see which gives the
best trade off
,Why do we have to scale the data for lasso
If we don't the measure of the data will artificially affect how big the
coefficients need to be
What is elastic net?
A variable selection method that works by minimizing the squared error
and constraining the combination of absolute values of coefficients and
their squares
What is a key difference between stepwise regresson and lasso regression
If the data is not scaled, the coefficients can have artificially different
orders of magnitude, which means they'll have unbalanced effects on the
lasso constraint.
Why doesn't ridge regression perform variable selection?
The coefficients values are squared so they go closer to zero or regularizes
them
, What are the pros and cons of greedy algorithms (forward selection, stepwise
elimination, stepwise regression)
Good for initial analysis but often don't perform as well on other data
because they fit more to random effects than you'd like and appear to have
a better fit
What are the pros and cons of lasso and elastic net
They are slower but help make models that make better predictions
Which two methods does elastic net look like it combines and what are the
downsides from it?
Ridge regression and lasso.
Advantages: variable selection from lasso and predictive benefits of lasso.
Disadvantages: arbitrarily rules out some correlated variables like lasso
(don't know which one that is left out should be); underestimates
coefficients of very predictive variables like ridge regresison
What are some downsides of surveys?
| VERIFIED ANSWERS GRADED A+ | LATEST EXAM
When might overfitting occur
When the # of factors is close to or larger than the # of data points causing
the model to potentially fit too closely to random effects
Why are simple models better than complex ones
Less data is required; less chance of insignificant factors and easier to
interpret
What is forward selection
We select the best new factor and see if it's good enough (r^2, aic, or p-
value) add it to our model and fit the model with the current set of factors.
Then at the end we remove factors that are lower than a certain threshold
What is backward elimination
What is backward elimination
,We start with all factors and find the worst on a supplied threshold (p =
0.15). If it is worse we remove it and start the process over. We do that until
we have the number of factors that we want and then we move the factors
lower than a second threshold (p = .05) and fit the model with all set of
factors
What is stepwise regression
It is a combination of forward selection and backward elimination. We can
either start with all factors or no factors and at each step we remove or add
a factor. As we go through the procedure after adding each new factor and
at the end we eliminate right away factors that no longer appear.
What type of algorithms are stepwise selection?
Greedy algorithms - at each step they take one thing that looks best
What is lasso
A variable selection method where the coefficients are determined by both
minimizing the squared error and the sum of their absolute value not being
over a certain threshold t
How do you choose t in lasso
Use the lasso approach with different values of t and see which gives the
best trade off
,Why do we have to scale the data for lasso
If we don't the measure of the data will artificially affect how big the
coefficients need to be
What is elastic net?
A variable selection method that works by minimizing the squared error
and constraining the combination of absolute values of coefficients and
their squares
What is a key difference between stepwise regresson and lasso regression
If the data is not scaled, the coefficients can have artificially different
orders of magnitude, which means they'll have unbalanced effects on the
lasso constraint.
Why doesn't ridge regression perform variable selection?
The coefficients values are squared so they go closer to zero or regularizes
them
, What are the pros and cons of greedy algorithms (forward selection, stepwise
elimination, stepwise regression)
Good for initial analysis but often don't perform as well on other data
because they fit more to random effects than you'd like and appear to have
a better fit
What are the pros and cons of lasso and elastic net
They are slower but help make models that make better predictions
Which two methods does elastic net look like it combines and what are the
downsides from it?
Ridge regression and lasso.
Advantages: variable selection from lasso and predictive benefits of lasso.
Disadvantages: arbitrarily rules out some correlated variables like lasso
(don't know which one that is left out should be); underestimates
coefficients of very predictive variables like ridge regresison
What are some downsides of surveys?