The marketing department of ACME Corporation needs to identify potential high-value customers for
their new Kitchen Robot. These robots are expensive, so we are looking to identify customers that can
afford such a machine. It's been determined that households with a net income greater than $50,000
USD are of interest in the marketing campaign, so you will choose a(n) ___________ algorithm to model
these customers.
classification
regression
affinity analysis
recommender system - Precise Answer ✔✔classification
Reducing the number of predictors to the smallest set that will still provide accurate predictions is a
concept called ____________________.
regeneration
parsimony
shrinkage
Gillette's razor - Precise Answer ✔✔parsimony
You have been given a dataset with 15 predictors and a binary outcome that denotes whether a
customer has left the company (yes or no). As an absolute minimum, you'll need _____________
samples to achieve an minimally accurate prediction.
150
180
190
200 - Precise Answer ✔✔180
,You have been given a dataset with 15 predictors and a numeric outcome that denotes the income that
a household has obtained. As an absolute minimum, you'll need _____________ samples to achieve an
minimally accurate prediction.
200
180
300
150 - Precise Answer ✔✔150
The process of identifying outliers is best performed by someone with domain knowledge as opposed to
someone with statistical knowledge.
True
False - Precise Answer ✔✔True
If you impute a missing value with its column mean, then you will ___________________.
maximize the variability of the dataset
overweight the variability of the dataset
understate the variability of the dataset
normalize the variability of the dataset - Precise Answer ✔✔understate the variability of the dataset
Standardization uses the following formula:
Using the rule-of-thumb method, one can assume that all extreme values (outliers) will be greater than
____________ or less than ____________.
0, 1
1, 0
,+3, -3
+1, -1 - Precise Answer ✔✔+3, -3
In contrast to standardization, normalization (i.e., MinMaxScaler in sci-kit learn) fits all values between
__________________.
-3 and +3
0 and 1
a lower and an upper boundary selected by the data analyst
-infinity, +infinity - Precise Answer ✔✔a lower and an upper boundary selected by the data analyst
Overfitting occurs when ___________________ is low, which makes ______________ higher.
variance, bias
bias, variance
irreducible error, reducible error
sampling, accuracy - Precise Answer ✔✔bias, variance
As a means to control excessive bias, we can use ______________.
data partitions
dimension reduction
standardization
normalization - Precise Answer ✔✔data partitions
When dealing with a class imbalance in a classification model, the data analyst can _____________ the
minority class or ________________ the majority class.
underweight, overweight
, underweight, oversample
subsample, oversample
overweight, underweight - Precise Answer ✔✔overweight, underweight
You are given a dataset that has many duplicate entries, that is customers who appear multiple times in
the data because of address changes, marriages, and mis-entries (boulevard instead of blvd., etc.)
Because of this situation, you will choose a(n) __________ algorithm to
clean up the dataset.
dimension reduction
data reduction
adaptive filtering
collaborative filtering - Precise Answer ✔✔data reduction
When working with linear or logistic regression, categorical variables must have one subtype removed
when one-hot encoding (dummy coding) or else the model will fail.
True
False - Precise Answer ✔✔True
Because machine learning is automated, there is not any human bias or discrimination in the results.
True
False - Precise Answer ✔✔False
An individual's behavior can be psychologically manipulated through the analysis of their Facebook data.
True
False - Precise Answer ✔✔True