CORRECT 100%
Two models are applied to a dataset that has been partitioned. Model A is considerably
more accurate than model B on the training data, but slightly less accurate than model
B on the validation data. Which model are you more likely to consider for final
deployment? - ANSWERModel B
Assuming that data mining techniques are to be used in the following case, identify
whether the task required is supervised or unsupervised learning.
Estimating the repair time required for an aircraft based on a trouble ticket. -
ANSWERSupervised
Assuming that data mining techniques are to be used in the following case, identify
whether the task required is supervised or unsupervised learning.
Printing of custom discount coupons at the conclusion of a grocery store checkout
based on what you just bought and what others have bought previously. -
ANSWERUnsupervised
For prediction models, a good rule of thumb is to have ______ records for every
predictor variable. - ANSWER10
Assuming that data mining techniques are to be used in the following case, identify
whether the task required is supervised or unsupervised learning.
Automated sorting of mail by zip code scanning. - ANSWERSupervised
Assuming that data mining techniques are to be used in the following case, identify
whether the task required is supervised or unsupervised learning.
Identifying a network data packet as dangerous (virus, hacker attack) based on
comparison to other packets whose threat status is known. - ANSWERSupervised
Assuming that data mining techniques are to be used in the following case, identify
whether the task required is supervised or unsupervised learning.
Identifying segments of similar customers. - ANSWERUnsupervised
A dataset has 1000 records and 50 variables with 5% of the values missing, spread
randomly throughout the records and variables. An analyst decides to remove records
with missing values. About how many records would you expect to be removed? -
ANSWER92.31% of records
Find matches for the data mining procedures. - ANSWERLinear regression- supervised
learning.
Collaborative filtering-
unsupervised learning.
, Neural nets-
supervised learning.
Association rules-
unsupervised learning.
Regression trees-
supervised learning.
Logistic regression-
supervised learning.
Principal components-
unsupervised learning.
Cluster analysis-
unsupervised learning.
Classification trees-
supervised learning.
k-Nearest-neighbors-
supervised learning.
Find matches for the following terms. - ANSWERUnsupervised Learning-
An analysis in which one attempts to learn patterns in the data other than predicting an
output value of interest.
Supervised Learning-
The process of providing an algorithm (logistic regression, regression tree, etc.) with
records in which an output variable of interest is known and the algorithm "learns" how
to predict this value with new records where the output is unknown.
Validation set-
The portion of the data used to assess how well the model fits, to adjust models, and to
select the best model from among those that have been tried.
test set-
The portion of the data used only at the end of the model building and selection process
to assess how well the final model might perform on new data.
training set-
The portion of the data used to fit a model.
Algorithm-
A specific procedure used to implement a particular data mining technique:
classification tree, discriminant analysis, and the like.
The second principal component represents any linear combination of the variables that
accounts for the most variability in the data, once the first principal component has been
extracted. - ANSWERFalse
What plots do you use to study relation of numerical outcome to categorical predictors?
- ANSWERBar charts, multiple panels, side by side boxplots
What plots do you use to determine the needs for transformations of the numerical
outcome variable or numerical predictors? - ANSWERboxplots, histograms