QMB3302 Final Exam, QMB3302 Final UF
Questions and Answers
Pipelines are useful (in analytics with Python sense) for the
following reasons? (choose all that apply)
- Pipelines make it easy to repeat/replicate steps and run multiple
models
- Pipelines are good for moving data into your programming
environment
- Pipelines automatically update to new versions of Python
- Pipelines help organize code you used to clean and treat your
data
- Pipelines make it very easy to change small things in your model,
like which variable to include
Ans: - Pipelines make it easy to repeat/replicate steps and run multiple
models
- Pipelines help organize code you used to clean and treat your data
- Pipelines make it very easy to change small things in your model, like
which variable to include
The basic idea of a regression is very simple. We have some X
value (which we call ______) and some Y value that we are trying to
_____. We could have multiple Y value, but that is not something
we have covered.
Ans: features; predict
Y and y-hat are a little different. Y is our target vector, and y-hat
is an output in our model that is a.... (choose one of the following)
- estimate or prediction of y
- the actual value of y
- an axis on our 2 way graph
- a combination of XY intercept coordinates
© 2025 All rights reserved
, 2 | Page
Ans: estimate or prediction of y
When looking at the code in the videos, we sometimes used a
variable to hold out model. What is the significance of the word
"model" in the below code?
model = LinearRegression(fit_intercept=True)
Ans: 'model' is a named variable and is just holding our linear
regression model. It could be renamed anything. The word itself is not
important. It is just a container.
What is a good model fit value?
Ans: unknowable without knowing/understanding the context of the
domain
Imagine X in the below is a missing value. If I were to run a
median imputer on this set of data, what would the return value
be?
50, 60, 70, 80, 100, 60, 5000, X
Ans: 70
Which of the below were discussed as being problems with the
holdout method for validation?
- Data is not available for test and control differences
- Outliers can skew the result
- The model is not trained on all of the data
- K=3 is not sufficiently large enough
- Validation is sometimes too challenging
Ans: - Outliers can skew the result
- The model is not trained on all of the data
The features in a model...
- are used as proxies for y-hat divided by y
- are always functions of each other
- keep the model validation process stable
© 2025 All rights reserved
, 3 | Page
- none of these answers are correct
Ans: none of these answers are correct
What is the first variable in a decision tree called (before any of
the branches)?
Ans: root
One problem with decision trees is that they are prone to _____ if
you are not careful or do not set the _____ appropriately.
Ans: overfitting; max depth
True or False: The random forest algorithm prevents, or at least
avoids to some extent, the problems with overfitting found in
decision trees.
Ans: True
True or False: Random Forests can only be used on classification
problems
Ans: False
True or False: In order to interpret Decision Tree's, it is necessary
to first run a linear regression
Ans: False
True or False: Decision Tree's are nice because they are fairly
simple and straightforward to interpret
Ans: True
When running our first decision tree, we took out "maxdepth=".
This had the unfortunate result of...
Ans: Building a very large hard to understand tree
What is the terminal node as discussed in the lecture?
© 2025 All rights reserved