QMB3302 Final Exam Questions and
Answers
Pipelines are useful (in analytics with Python sense) for the
following reasons? (choose all that apply)
- Pipelines make it easy to repeat/replicate steps and run multiple
models
- Pipelines are good for moving data into your programming
environment
- Pipelines automatically update to new versions of Python
- Pipelines help organize code you used to clean and treat your
data
- Pipelines make it very easy to change small things in your model,
like which variable to include
Ans: - Pipelines make it easy to repeat/replicate steps and run multiple
models
- Pipelines help organize code you used to clean and treat your data
- Pipelines make it very easy to change small things in your model, like
which variable to include
The basic idea of a regression is very simple. We have some X
value (which we call ______) and some Y value that we are trying to
_____. We could have multiple Y value, but that is not something
we have covered.
Ans: features; predict
Y and y-hat are a little different. Y is our target vector, and y-hat
is an output in our model that is a.... (choose one of the following)
- estimate or prediction of y
- the actual value of y
- an axis on our 2 way graph
- a combination of XY intercept coordinates
© 2025 All rights reserved
, 2 | Page
Ans: estimate or prediction of y
When looking at the code in the videos, we sometimes used a
variable to hold out model. What is the significance of the word
"model" in the below code?
model = LinearRegression(fit_intercept=True)
Ans: 'model' is a named variable and is just holding our linear
regression model. It could be renamed anything. The word itself is not
important. It is just a container.
What is a good model fit value?
Ans: unknowable without knowing/understanding the context of the
domain
Imagine X in the below is a missing value. If I were to run a
median imputer on this set of data, what would the return value
be?
50, 60, 70, 80, 100, 60, 5000, X
Ans: 70
Which of the below were discussed as being problems with the
holdout method for validation?
- Data is not available for test and control differences
- Outliers can skew the result
- The model is not trained on all of the data
- K=3 is not sufficiently large enough
- Validation is sometimes too challenging
Ans: - Outliers can skew the result
- The model is not trained on all of the data
The features in a model...
- are used as proxies for y-hat divided by y
- are always functions of each other
- keep the model validation process stable
© 2025 All rights reserved