most recent version Comprehensive 100
questions and verified answers accurate
solutions Already graded A+ Get it 100%
correct
Pipelines are useful (in analytics with Python sense) for the
following reasons? (choose all that apply)
- Pipelines make it easy to repeat/replicate steps and run
multiple models
- Pipelines are good for moving data into your programming
environment
- Pipelines automatically update to new versions of Python
- Pipelines help organize code you used to clean and treat
your data
- Pipelines make it very easy to change small things in your
model, like which variable to include - ,,,,answer,,,,..-
Pipelines make it easy to repeat/replicate steps and run
multiple models
- Pipelines help organize code you used to clean and treat
your data
- Pipelines make it very easy to change small things in your
model, like which variable to include
,The basic idea of a regression is very simple. We have some
X value (which we call ______) and some Y value that we are
trying to _____. We could have multiple Y value, but that is not
something we have covered. - ,,,,answer,,,,..features; predict
Y and y-hat are a little different. Y is our target vector, and y-
hat is an output in our model that is a.... (choose one of the
following)
- estimate or prediction of y
- the actual value of y
- an axis on our 2 way graph
- a combination of XY intercept coordinates -
,,,,answer,,,,..estimate or prediction of y
When looking at the code in the videos, we sometimes used a
variable to hold out model. What is the significance of the
word "model" in the below code?
model = LinearRegression(fit_intercept=True) -
,,,,answer,,,,..'model' is a named variable and is just holding
our linear regression model. It could be renamed anything.
The word itself is not important. It is just a container.
What is a good model fit value? - ,,,,answer,,,,..unknowable
without knowing/understanding the context of the domain
Imagine X in the below is a missing value. If I were to run a
median imputer on this set of data, what would the return
value be?
, 50, 60, 70, 80, 100, 60, 5000, X - ,,,,answer,,,,..70
Which of the below were discussed as being problems with
the holdout method for validation?
- Data is not available for test and control differences
- Outliers can skew the result
- The model is not trained on all of the data
- K=3 is not sufficiently large enough
- Validation is sometimes too challenging - ,,,,answer,,,,..-
Outliers can skew the result
- The model is not trained on all of the data
The features in a model...
- are used as proxies for y-hat divided by y
- are always functions of each other
- keep the model validation process stable
- none of these answers are correct - ,,,,answer,,,,..none of
these answers are correct
What is the first variable in a decision tree called (before any
of the branches)? - ,,,,answer,,,,..root
One problem with decision trees is that they are prone to
_____ if you are not careful or do not set the _____
appropriately. - ,,,,answer,,,,..overfitting; max depth
True or False: The random forest algorithm prevents, or at
least avoids to some extent, the problems with overfitting
found in decision trees. - ,,,,answer,,,,..True