ISYE 6501 - Finals sample
Quiz Prep Test Questions
and Answers
Which of the following three statements is correct?
- The selected model's expected performance on test data will be better than its
expected performance on the validation data, because there is a selection bias: the
selected model is more likely to have worse-than-average performance on random
patterns in the validation data.
- The selected model's expected performance on test data will be the same as its
expected performance on the validation data, because the validation data and the test
data are the same.
- The selected model's expected performance on test data will be worse than its
expected performance on the validation data, because there is a selection bias: the
selected model is more likely to have better-than-average performance on random
patterns in the validation data. - Answer-The selected model's expected performance on
test data will be worse than its expected performance on the validation data, because
there is a selection bias: the selected model is more likely to have better-than-average
performance on random patterns in the validation data.
Which of the following three statements is correct?
- It is unclear how the selected model's expected performance on test data compares to
its observed performance on real-time data, because the training data and the test data
were taken from the same population, but the real-time data might be different
- The selected model's expected performance on test data must be worse than its
observed performance on real-time data, because the training data and test data were
taken from the same population, but the real-time data might be different.
- The selected model's expected performance on test data must be better than its
observed performance on real-time data, because the training data and test data were
taken from the same population, but the real-time data might be different. - Answer-It is
unclear how the selected model's expected performance on test data compares to its
observed performance on real-time data, because the training data and the test data
were taken from the same population, but the real-time data might be different
,A positive correlation has been observed between hours of sleep and self-reported
happiness (people who sleep more are happier, and happier people sleep more). Based
on that observed correlation, select all of the following statements about the direction of
causality between sleep and happiness that are true.
A. Lack of sleep makes people unhappy: The less people sleep, the less happy they
feel.
B. Unhappiness causes lack of sleep: When people feel unhappy, they have trouble
sleeping.
C. Both less sleep and more unhappiness are positively correlated with another factor,
which causes both.
D. Can't tell without more analysis. - Answer-NOT SELECTED AS TRUE
NOT SELECTED AS TRUE
NOT SELECTED AS TRUE
SELECTED AS TRUE
For each of the four situations below, specify which would be better: including a "data
missing" binary variable or imputing missing data.
A. 2% of the data points have missing values, and you can build a good predictive
model for the missing data.
B. 2% of the data points have missing values, and you cannot build a good predictive
model for the missing data.
C. 50% of the data points have missing values for this variable, and you believe that
points with missing data have a different distribution of values from points where data is
present.
D. 50% of the data points have missing values for this variable, and you cannot build a
good predictive model for the missing data. - Answer-IMPUTE MISSING DATA
"DATA MISSING" BINARY VARIABLE
"DATA MISSING" BINARY VARIABLE
"DATA MISSING" BINARY VARIABLE
Which model is more directly appropriate to estimate the amount of time it will take to
process a certain loan application?
- Linear regression
- Logistic regression - Answer-Linear regression
, Which model is more directly appropriate to estimate the likelihood that a flight from
Atlanta to Detroit will take more than two hours?
- Linear regression
- Logistic regression - Answer-Logistic regression
Which model is more directly appropriate to estimate the probability that a patient
survives heart transplant surgery?
- Linear regression
- Logistic regression - Answer-Logistic regression
Which model is more directly appropriate to forecast the number of hot dogs that will be
sold at a baseball game?
- Linear regression
- Logistic regression - Answer-Linear regression
For each data point, the response is not known but an expert has provided an estimate
of the response.
- Supervised learning model (like classification)
- Unsupervised learning model (like clustering) - Answer-Supervised learning model
(like classification)
For each data point, the response is known.
- Supervised learning model (like classification)
- Unsupervised learning model (like clustering) - Answer-Supervised learning model
(like classification)
For each data point, the response is not known and there is no expert estimate
- Supervised learning model (like classification)
- Unsupervised learning model (like clustering) - Answer-Unsupervised learning model
(like clustering)
A fire department has collected data on how long it takes to put out fires, based on
attributes of the fire (type of fire, size and material of building, etc.) and how many fire
trucks were there. Now, the fire department wants to use that data to predict and make
decisions about fighting fires. For each of the following situations, specify which model
is more appropriate: classification or linear regression.
A. The fire department wants to predict whether or not the fire will be put out in less than
an hour if two fire trucks are sent to the fire.
Quiz Prep Test Questions
and Answers
Which of the following three statements is correct?
- The selected model's expected performance on test data will be better than its
expected performance on the validation data, because there is a selection bias: the
selected model is more likely to have worse-than-average performance on random
patterns in the validation data.
- The selected model's expected performance on test data will be the same as its
expected performance on the validation data, because the validation data and the test
data are the same.
- The selected model's expected performance on test data will be worse than its
expected performance on the validation data, because there is a selection bias: the
selected model is more likely to have better-than-average performance on random
patterns in the validation data. - Answer-The selected model's expected performance on
test data will be worse than its expected performance on the validation data, because
there is a selection bias: the selected model is more likely to have better-than-average
performance on random patterns in the validation data.
Which of the following three statements is correct?
- It is unclear how the selected model's expected performance on test data compares to
its observed performance on real-time data, because the training data and the test data
were taken from the same population, but the real-time data might be different
- The selected model's expected performance on test data must be worse than its
observed performance on real-time data, because the training data and test data were
taken from the same population, but the real-time data might be different.
- The selected model's expected performance on test data must be better than its
observed performance on real-time data, because the training data and test data were
taken from the same population, but the real-time data might be different. - Answer-It is
unclear how the selected model's expected performance on test data compares to its
observed performance on real-time data, because the training data and the test data
were taken from the same population, but the real-time data might be different
,A positive correlation has been observed between hours of sleep and self-reported
happiness (people who sleep more are happier, and happier people sleep more). Based
on that observed correlation, select all of the following statements about the direction of
causality between sleep and happiness that are true.
A. Lack of sleep makes people unhappy: The less people sleep, the less happy they
feel.
B. Unhappiness causes lack of sleep: When people feel unhappy, they have trouble
sleeping.
C. Both less sleep and more unhappiness are positively correlated with another factor,
which causes both.
D. Can't tell without more analysis. - Answer-NOT SELECTED AS TRUE
NOT SELECTED AS TRUE
NOT SELECTED AS TRUE
SELECTED AS TRUE
For each of the four situations below, specify which would be better: including a "data
missing" binary variable or imputing missing data.
A. 2% of the data points have missing values, and you can build a good predictive
model for the missing data.
B. 2% of the data points have missing values, and you cannot build a good predictive
model for the missing data.
C. 50% of the data points have missing values for this variable, and you believe that
points with missing data have a different distribution of values from points where data is
present.
D. 50% of the data points have missing values for this variable, and you cannot build a
good predictive model for the missing data. - Answer-IMPUTE MISSING DATA
"DATA MISSING" BINARY VARIABLE
"DATA MISSING" BINARY VARIABLE
"DATA MISSING" BINARY VARIABLE
Which model is more directly appropriate to estimate the amount of time it will take to
process a certain loan application?
- Linear regression
- Logistic regression - Answer-Linear regression
, Which model is more directly appropriate to estimate the likelihood that a flight from
Atlanta to Detroit will take more than two hours?
- Linear regression
- Logistic regression - Answer-Logistic regression
Which model is more directly appropriate to estimate the probability that a patient
survives heart transplant surgery?
- Linear regression
- Logistic regression - Answer-Logistic regression
Which model is more directly appropriate to forecast the number of hot dogs that will be
sold at a baseball game?
- Linear regression
- Logistic regression - Answer-Linear regression
For each data point, the response is not known but an expert has provided an estimate
of the response.
- Supervised learning model (like classification)
- Unsupervised learning model (like clustering) - Answer-Supervised learning model
(like classification)
For each data point, the response is known.
- Supervised learning model (like classification)
- Unsupervised learning model (like clustering) - Answer-Supervised learning model
(like classification)
For each data point, the response is not known and there is no expert estimate
- Supervised learning model (like classification)
- Unsupervised learning model (like clustering) - Answer-Unsupervised learning model
(like clustering)
A fire department has collected data on how long it takes to put out fires, based on
attributes of the fire (type of fire, size and material of building, etc.) and how many fire
trucks were there. Now, the fire department wants to use that data to predict and make
decisions about fighting fires. For each of the following situations, specify which model
is more appropriate: classification or linear regression.
A. The fire department wants to predict whether or not the fire will be put out in less than
an hour if two fire trucks are sent to the fire.