Solutions
Bayesian Machine Learning: Compare and contrast maximum
likelihood and maximum a posteriori estimation. Correct
Answers
Can any similarity function be used for SVM? Correct Answers
No, have to satisfy Mercer's theorem
Can we apply the kernel trick to logistic regression? Why is it
not used in practice then? Correct Answers
Can you list some disadvantages related to linear models?
Correct Answers There are many disadvantages to using linear
models, but the main ones are:
Errors in linearity assumptions
Lacks autocorrelation
It can't solve overfitting problems
You can't use it to calculate outcomes or binary outcomes
https://www.quora.com/What-are-the-limitations-of-linear-
regression-modeling-in-data-analysis
Compare GMM vs GDA. Correct Answers
DECISION TREES: Are decision trees parametric or non-
parametric models? Correct Answers Non-parametric. The
number of model parameters is not determined before creating
the model.
,DECISION TREES: How is feature importance evaluated in
decision-tree-based models? Correct Answers The features that
are split on most frequently and are closest to the top of the tree,
thus affecting the largest number of samples, are considered to
be the most important.
DECISION TREES: What are some common uses of decision
tree algorithms? Correct Answers 1. Classification
2. Regression
3. Measuring feature importance
4. Feature selection
DECISION TREES: What are some ways to reduce overfitting
with decision trees? Correct Answers - Reduce maximum depth
- Increase min samples split
- Balance your data to prevent bias toward dominant classes
- Increase the number of samples
- Decrease the number of features
DECISION TREES: What are the main hyperparameters that
you can tune for decision trees? Correct Answers Generally
speaking, we have the following parameters:
,max depth - maximum tree depth
min samples split - minimum number of samples for a node to
be split
min samples leaf - minimum number of samples for each leaf
node
max leaf nodes - the maximum number of leaf nodes in the tree
max features - maximum number of features that are evaluated
for splitting at each node (only valid for algorithms that
randomize features considered at each split)
Other similar hyperparameters may be derived from the above
hyperparameters.
The "traditional" decision tree is greedy and looks at all features
at each split point, but many modern implementations allow
splitting on randomized features (as seen in sklearn), so max
features is may or may not be a tuneable hyperparameter.
DECISION TREES: What do high and low Entropy scores
mean? Correct Answers Low Entropy (near 0) = most records
from the sample are in the same class
High Entropy (maximum of 1) = records from sample are spread
evenly across classes
, DECISION TREES: What do high and low Gini scores mean?
Correct Answers Low Gini (near 0) = most records from the
sample are in the same class
High Gini (maximum of 1 or less, depending on number of
classes) = records from sample are spread evenly across classes
DECISION TREES: What is entropy? Correct Answers
Entropy is the measure of the purity of members among non-
empty classes. It is very similar to Gini in concept, but a slightly
different calculation.
DECISION TREES: What is Gini impurity? Correct Answers
Gini impurity (also called the Gini index) is a measurement of
how often a randomly chosen record would be incorrectly
classified if it was randomly classified using the distribution of
the set of samples.
DECISION TREES: What metrics are usually used to compute
splits? Correct Answers Gini impurity or entropy. Both
generally produce similar results.
DECISION TREES: xplain how each hyperparameter affects the
model's ability to learn. Correct Answers Generally speaking...
max depth - increasing max depth will decreases bias and
increases variance
min samples split - increasing min samples split increases bias
and decreases variance