Summary

Summary Cheatsheet Final Exam Machine Learning

Name: Cheatsheet Final Exam Machine Learning
SKU: doc_656422
Rating: 4.50 (2 reviews)
Author: lisajanssen1

Rating

4.5

(2)

Sold

Pages

Uploaded on

16-02-2020

Written in

2019/2020

Detailed cheatsheet with all important notes for the final exam of the course Machine Learning. I passed the exam by only studying this cheatsheet.

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Tilburg University (UVT)
Study: Data Science & Society
Course: Machine Learning

All documents for this subject (16)

Document information

Uploaded on: February 16, 2020
Number of pages: 2
Written in: 2019/2020
Type: Summary

Subjects

machine learning
data science
data science and society
deep learning
data mining
python
data science regulation law
data science society
statistics methodology

Content preview

Introduction. Machine Learning: learning to solve problems from Perceptron. Computes weighted sum
examples, come up with algorithms after learning and applying Binary of input features (plus bias), if sum >= 0,
classification: either x-y(positive-negative) or x-nonx(spam-nonspam). ROC outputs +1, otherwise outputs -1. Linear
curve: plots precision vs recall (sensitivity vs specificity). Cross validation: Classifier: simplest linear model, finding
break traindata in 10 parts, train on 9 and test on 1. LOO: cross validation simple boundaries (separating +1 and -
when K=N-1. Good for KNN, otherwise expensive to run. Confidence: 1). Discriminant: f(x) = w · x + b. Bias:
confidence of 95% means if you reran 100 times, 95 of decides which class the node should be
these would do better. Debugging: collect data, choose pushed to, does not depend on input
features, choose model fam, choose traindata, train value. When w · x = 0, bias decides which class to predict  makes default
model, evaluate on test. Canonical Learning decision  biases classifier towards positive or negative class. In the
Problems: regression, binary classification, multiclass beginning it is all random, after iterating weights and biases are gradually
Gradient Descent. Model that uses inputs to predict outputs. Gradient
classification, multi-label classification, ranking, shifted so next result is closer to desired output. Error-driven: it is online, Descent to find model parameters with lowest error, is not limited to linear
sequence labelling, sequence-to-sequence labelling, looks at one example at a time. If doing well, doesn’t update parameters models only. Optimization algorithm: how it learns: model + optimization.
autonomous behaviour. MSE: average square of (only when error occurs). Finding (w,b): go through all examples, try with Optimization means finding a minimum or maximum of a function. However,
difference between true and pred value. MAE: average absolute difference current (w,b), if correct  continue, otherwise  adjust (w,b). Streaming optimizing zero/one loss is hard. An option is to concoct an S-shaped
between true and pred value. FP: no spam, marked as spam. FN: not data: data which does not stop coming (recordings from sensors, social function which is smooth and potentially easier to optimize, but it is not
marked as spam, is spam. TP: marked as spam, is spam. TN: not marked media posts, news articles). convex. Convex function: looks like a happy face, easy to minimize. It’s
as spam, is spam. Accuracy: number of correct predictions, (TP + TN) / (P Online: online learners like the perceptron are good for streaming data. always non-negative. Concave function: looks like a sad face. Surrogate
+ N) 1 – error rate. Error: proportion of mistakes. Precision: of all found x, Online algorithm only remembers current example. Can imitate batch Loss Functions: hinge loss, logistic loss, exponential loss, squared loss.
how many were actually x? P = TP / marked. Recall: of all x out there, how learning by iterating over data several times in order to extract more SSE: Sum of Squared Errors. Used for
many found x? R = TP / spam. F-score: harmonic mean between precision information from it. Evaluation Online Learning: predict current example  measuring error. Find w value: start with
and recall, F1 = 2 * ((P * R) / (P + R)). Macro Average: compute precision record correct or not  update model (if necessary)  next example. random value for w  check slope of
and recall per class, take average. Micro Average: correct prediction as TP, Always checks error rate, and never evaluate/test on examples which are function  descend the slope  adjust w to
missing classification as FN, incorrect prediction as FP. used for traindata. Early stopping: stop training when error on validation decrease f(w). First Derivative: if we
data stops dropping. When training error goes down, but validation goes up define f(w) = w2, the first derivative is
Decision Trees/Forest. Generalization: ability to view something new in a  over fitting. Sparsity: a sparse representation which omits zero values. f’(w) = 2w. Slope: describes steepness
related way. Goal induction: take traindata, use to induce function ‘f’, evaluate ‘f’ of a single dimension. Gradient is the
on testdata. Succeeds if performance on testdata is high. Advantages of DT: collection of slopes, one for each dimension. To compute: first derivative 
transparent, easily understandable, fast (no revision needed). Disadvantages of for function ‘f’, first derivative van be written f’  then f’(a) is the slope of
DT: intricate treeshape, depends on minor details, over fitting, try limiting the depth. function f at point a. Basic Gradient Descent: for f(w)  w2. Ready to
Building DT: number of possible trees grows exponentially with number of Descent: initialize w to some
features, needs to be built incrementally. Ask the most important questions first, so value (e.g 10)  update 
the ones which help us classify. Left branch  apply algorithm to NO examples, N is the learning rate,
right branch  apply algorithm to YES examples. Recursion: function that calls controlling speed of descent  stop when w does not change anymore. If
learning rate is too big  we will get further away from solution instead of
itself until some base case is reached (otherwise would continue infinitely). Base
Feature Engineering. Process of transforming raw data into features that closer. Stochastic Gradient Descent: randomized gradient descent, works
case = leaf node, recursive call = left/right subtree. (Un)balanced Trees: balanced
better with large datasets. Momentum: large momentum = difficult to
trees are ‘better’  faster, depends on depth of tree. Prediction time does not better represent the underlying problem to the predictive models, resulting in
change direction. A modification to SGD which smooths gradient estimates
depend on number of questions, but on number of unique combinations. improving model accuracy for unseen data. It gets the most out of your data.
Discretization: use quantiles as threshold or choose thresholds present in data. with memory. No modification to learning rate. Finding Derivatives: in the
Algorithms are generic, features are specific. Feature engineering is often a
Measure Impurity: to find the best split general case: symbolic or automated differentiation  get gradients for
major part of machine learning. Categorical vectors: some algorithms (decision
condition (quality of question), stops where no complicated functions composed of differentiable operations  automatic
trees/random forests) can easily use categorical features such as occupation or
application of chain rule (Tensorflow, PyTorch). Local Minima: can get your
improvement is possible. Entropy IH(P): nationality. Otherwise  convert to numerical. Feature engineering: extracting
measure of uniformity of distribution. More optimizer trapped. Potential problem for non-linear models (such as neural
features, transforming features, selecting features. Feature engineering is
uniform  more uncertainty (and thus data is networks). Not really problem in high-dimensional data. In most cases don’t
domain specific, and domain expertise is needed. Common Feature Sources:
not divided enough). Tries to minimize care about local minima. Simplest way to avoid  restart from different
text, visual, audio, sensors, surveys. Feature transformations: standardizing (z-
uniformity. Gini Impurity IG(P): measuring how often a random element would be starting point which is more accessible. While searching for the global min,
scoring), log-transform, polynomial features (combining features). Text
labelled incorrectly if labels were assigned randomly. Random Forest: many DT’s, model can encounter many ‘valleys’ and the bottoms we call local minimum.
Features: word counts, word ngram-counts, character ngram-counts, word
Depending on model, if the valley is deep enough, the process might get
randomly distributing features over different trees, increased generalizability, vectors. MEG: signal amplitude at number of locations on surface of channels,
stuck there and we end up with local min instead of global which means that
variance is lower, but interpretability is worse. evolving in time. Feature Ablation Analysis: remove one feature at a time 
we end up with less than optimal cost. Not necessarily a big problem in high
measure drop in accuracy  quantifies contribution of feature, given all other
dimensional data. Less likely that there is a decrease in the error function in
features. Feature Learning: unsupervised learning: word vectors (LSA,
any direction if the parameter space is high, so there should be less local
word2vec, GloVe). Neural networks can extract features from ‘raw’ inputs while
minima.
learning (speech: audio wave, image=pixels, text=byte sequences). Pairwise
interactions: linear classifiers need information about joint-occurrence. Always
consider the expressiveness of
your model when engineering features.

$6.15

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

lisajanssen1

3.5

(4)

Reviews from verified buyers

Showing all 2 reviews

Emile Rechtsgeleerdheid · 11 reviews

5 year ago

isabelle_olphen Liberal Arts and Sciences · 11 reviews

5 year ago

4.5

2 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

lisajanssen1 Tilburg University

View profile

Sold

Member since

9 year

Number of followers

Documents

Last sold

5 days ago

3.5

4 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lisajanssen1. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.15. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 57429 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Cheatsheet Final Exam Machine Learning

Written for

Document information

Subjects

Content preview

Reviews from verified buyers

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?