Mining Midterm
What occurs if our model makes no assumptions? - correct answer-No
learning occurs!
How many possible mappings exist for a binary vector of length d mapping to
a binary output? - correct answer-2^(2^(d)) possible mappings
What if we have n data points? - correct answer-Still 2^(2(^(d) - n) -- SO
many!
What assumption do we make for kNN? - correct answer-(i) Label changes
smoothly as features change in local regions; (ii) Each feature affects the
output independently
What assumption do we make for logistic regression? - correct answer-(i) The
relationship between input and output can be expressed linearly;
(ii) Label changes smoothly as features change locally;
(iii) Each feature effects output independently
What algorithms share a similar assumption? - correct answer-SVM,
perceptron, and linear regression - examples can be linearly separated or
predicted with a linear model
What is modelling error, and how do we reduce it? - correct answer-You chose
the wrong model / hypothesis space. Reduce by choosing a better model.
What is estimation error, and how do we reduce it? - correct answer-You didn't
have enough data. Reduce by adding more data (infinite data).
What is optimization error, and how do we reduce it? - correct answer-Your
model was not optimized well. Reduce by optimizing longer (infinite training
time), by substituting with a better optimization algorithm or model, or applying
more expensive optimization.
What is Bayes' error, and how do we reduce it? - correct answer-Your model
was unable to distinguish between overlapping distributions. Irreducible with a
, given dataset, UNLESS a new feature is introduced which meaningfully
discriminates between instances with the same features but different label. If
impossible, this error is called "irreducible".
What is overfitting? - correct answer-Model performs well on training, but
poorly on validation or test data.
What is underfitting? - correct answer-Model performs badly on training,
validation, and test data.
What is model selection? - correct answer-The process of finding the proper
hypothesis space (AKA the "model class") that neither underfits nor overfits.
This is challenging!
What is kNN? - correct answer-A type of model that predicts the label of an
unknown example by measuring some weighted average of the k neighbors
around it that are closest in distance (k-nearest). In the vanilla model, all
weights w_i are = 1.
What are the effects of extreme k-values in k-NN? - correct answer-At k=1,
training error is zero when Bayes error is zero. At k=n, every point is a
neighbor, leading to the majority class being predicted everywhere in the
dataset.
What would a generalization of k-NN to a regression task entail? - correct
answer-Take the weighted average of all the neighbors, and predict that value.
For 1-NN, just take the value of the closest neighbor.
For a regression k-NN, what would we predict when k=n? - correct
answer-The predicted value would equal the average value of the dataset.
What are the problems with k-NN? - correct answer-- Computationally
expensive: requires O(nd) for every test point, although proper choice of data
structure can reduce this cost.
- Massive datasets require lots of examples, but this can be reduced if we
remove "unimportant" examples lying in the "safe" region with many of the
same labels.
- The relative scale of features matters as well, because large distances
between large features matters more than small distances between small
features, so we should scale features [0,1].