SRM - SOA THEORY QUESTION
NOTES WITH CORRECT ANSWERS
A good test performance requires ______ variance and __________ bias. - Correct
Answers -low, low
Add an interaction term will generally __________ bias and __________ variance. -
Correct Answers -reduce, increase
Bias refers to... - Correct Answers -... to the error arising from the assumptions made
Variance refers to... - Correct Answers -... to the error arising from the methods
sensitivity towards the TRAINing dataset
The accuracy of the prediction of Y depends on - Correct Answers -Both reducible error
and irreducible error
When K=1 the __________ error rate is 0 and the _________ error rate may be high. -
Correct Answers -Training, Test
When inference is the goal, do you want something that is flexible or interpretable? -
Correct Answers -interpretable
Will a flexible model protect against unseen data? - Correct Answers -Not always
because it can cause overfitting.
Linear Regression in general terms is relatively ________. - Correct Answers -Not
Flexible
Supervised/Unsupervised and Parametric/Non-Parametric:
SLR, MLR, GLM - Correct Answers -Supervised and Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Ridge, Lasso - Correct Answers -Supervised and Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
,Weighted Least Squares - Correct Answers -Supervised and Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Partial Least Squares - Correct Answers -Supervised and Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
K Nearest Neighbors (KNN) - Correct Answers -Supervised and Non-Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Decision Tree - Correct Answers -Supervised and Non-Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Bagging, Random Forest, Boosting - Correct Answers -Supervised and Non-Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Cluster Analysis - Correct Answers -Unsupervised and anything unsupervised does not
fall into parametric or non parametric measures
Supervised/Unsupervised and Parametric/Non-Parametric:
Principal Components Analysis (PCA) - Correct Answers -Unsupervised and anything
unsupervised does not fall into parametric or non parametric measures
Supervised/Unsupervised and Parametric/Non-Parametric:
Principal Components Regression - Correct Answers -Supervised and Parametric
What Methods are Low Flexibility and High Interpretability - Correct Answers -Lasso
and Subset Selection
What Methods are Mid Flexibility and Mid Interpretability - Correct Answers -Least Sq's,
Regression trees and Classification trees
What Methods are High Flexibility Low Interpretability - Correct Answers -Bagging
Boosting
When using cross validation to select the best model do you want to choose the model
with the highest or the lowest cross validation error? - Correct Answers -lowest
DF of SLR - Correct Answers -DF = n-2
, LOOCV is trained on: - Correct Answers -n-1 observations
How to spot heteroscedasticity. - Correct Answers -the spread of the residuals grows as
the predictions are larger.
How to fix heteroscedasticity. - Correct Answers -A logarithm transformation or a square
root transformation
What accommodates the possibility of the dependent variable being zero? - Correct
Answers -Adding a constant to the dependent variable, i.e. ln(1+Y)
Two differences between ridge regression and lasso regression - Correct Answers -
1)Both methods shrink coefficients towards zero, but lasso can force some of the
estimates to be exactly equal to zero.
2)Unlike ridge regression, lasso performs variable selection, and hence results in
models that are easier to interpret.
Number of linear combinations to fit for Best Subset Selection? - Correct Answers -2^p
Number of linear combinations to fit for Forward and Backward Subset Selection? -
Correct Answers -1+[(p*(p+1))/2]
VIF = - Correct Answers -1/(1-R^2)
Is OLS regression coefficients affected by scaling a predictor and refitting the model? -
Correct Answers -No they remain unchanged
Is Ridge regression coefficients affected by scaling a predictor and refitting the model? -
Correct Answers -Yes they will change
Is Lasso regression coefficients affected by scaling a predictor and refitting the model? -
Correct Answers -Yes they will change
Which method guarantees identical test error estimates, K fold cross validation,
Validation Set, or LOOCV? - Correct Answers -LOOCV;
LOOCV is deterministic(predictions without involving randomness) and not subject to
randomness.
Validation set and k-fold cross-validation involve random division of data into training
and validation sets, leading to the possibility that each statistician could receive varying
error estimates
Which is wider the prediction interval or the the confidence interval ? - Correct Answers
-PI will always be wider than CI
NOTES WITH CORRECT ANSWERS
A good test performance requires ______ variance and __________ bias. - Correct
Answers -low, low
Add an interaction term will generally __________ bias and __________ variance. -
Correct Answers -reduce, increase
Bias refers to... - Correct Answers -... to the error arising from the assumptions made
Variance refers to... - Correct Answers -... to the error arising from the methods
sensitivity towards the TRAINing dataset
The accuracy of the prediction of Y depends on - Correct Answers -Both reducible error
and irreducible error
When K=1 the __________ error rate is 0 and the _________ error rate may be high. -
Correct Answers -Training, Test
When inference is the goal, do you want something that is flexible or interpretable? -
Correct Answers -interpretable
Will a flexible model protect against unseen data? - Correct Answers -Not always
because it can cause overfitting.
Linear Regression in general terms is relatively ________. - Correct Answers -Not
Flexible
Supervised/Unsupervised and Parametric/Non-Parametric:
SLR, MLR, GLM - Correct Answers -Supervised and Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Ridge, Lasso - Correct Answers -Supervised and Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
,Weighted Least Squares - Correct Answers -Supervised and Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Partial Least Squares - Correct Answers -Supervised and Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
K Nearest Neighbors (KNN) - Correct Answers -Supervised and Non-Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Decision Tree - Correct Answers -Supervised and Non-Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Bagging, Random Forest, Boosting - Correct Answers -Supervised and Non-Parametric
Supervised/Unsupervised and Parametric/Non-Parametric:
Cluster Analysis - Correct Answers -Unsupervised and anything unsupervised does not
fall into parametric or non parametric measures
Supervised/Unsupervised and Parametric/Non-Parametric:
Principal Components Analysis (PCA) - Correct Answers -Unsupervised and anything
unsupervised does not fall into parametric or non parametric measures
Supervised/Unsupervised and Parametric/Non-Parametric:
Principal Components Regression - Correct Answers -Supervised and Parametric
What Methods are Low Flexibility and High Interpretability - Correct Answers -Lasso
and Subset Selection
What Methods are Mid Flexibility and Mid Interpretability - Correct Answers -Least Sq's,
Regression trees and Classification trees
What Methods are High Flexibility Low Interpretability - Correct Answers -Bagging
Boosting
When using cross validation to select the best model do you want to choose the model
with the highest or the lowest cross validation error? - Correct Answers -lowest
DF of SLR - Correct Answers -DF = n-2
, LOOCV is trained on: - Correct Answers -n-1 observations
How to spot heteroscedasticity. - Correct Answers -the spread of the residuals grows as
the predictions are larger.
How to fix heteroscedasticity. - Correct Answers -A logarithm transformation or a square
root transformation
What accommodates the possibility of the dependent variable being zero? - Correct
Answers -Adding a constant to the dependent variable, i.e. ln(1+Y)
Two differences between ridge regression and lasso regression - Correct Answers -
1)Both methods shrink coefficients towards zero, but lasso can force some of the
estimates to be exactly equal to zero.
2)Unlike ridge regression, lasso performs variable selection, and hence results in
models that are easier to interpret.
Number of linear combinations to fit for Best Subset Selection? - Correct Answers -2^p
Number of linear combinations to fit for Forward and Backward Subset Selection? -
Correct Answers -1+[(p*(p+1))/2]
VIF = - Correct Answers -1/(1-R^2)
Is OLS regression coefficients affected by scaling a predictor and refitting the model? -
Correct Answers -No they remain unchanged
Is Ridge regression coefficients affected by scaling a predictor and refitting the model? -
Correct Answers -Yes they will change
Is Lasso regression coefficients affected by scaling a predictor and refitting the model? -
Correct Answers -Yes they will change
Which method guarantees identical test error estimates, K fold cross validation,
Validation Set, or LOOCV? - Correct Answers -LOOCV;
LOOCV is deterministic(predictions without involving randomness) and not subject to
randomness.
Validation set and k-fold cross-validation involve random division of data into training
and validation sets, leading to the possibility that each statistician could receive varying
error estimates
Which is wider the prediction interval or the the confidence interval ? - Correct Answers
-PI will always be wider than CI