Summary

Samenvatting Applied Machine Learning

Rating

Sold

Pages

Uploaded on

01-06-2024

Written in

2023/2024

This document contains a summary of all exam material for the Applied Machine Learning course

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Universiteit van Amsterdam (UvA)
Study: Master Data Science
Course: Applied Machine Learning (5294APML6Y)

All documents for this subject (1)

Document information

Uploaded on: June 1, 2024
Number of pages: 37
Written in: 2023/2024
Type: Summary

Subjects

summarry
data science
science
data
master
applied machine learning
samenvatting
uva
ai
ki

Content preview

Week 1

Deep learning is based on the approach of having many hierarchy levels. The hierarchy of
concepts enable the computer to learn complicated concepts by building them out of simpler
ones.

A computer can reason automatically about statements in formal languages using logical
inference rules. This is known as the knowledge base approach to AI.

AI systems need the ability to acquire their own knowledge by extracting patterns from raw data.
This capability is known as machine learning.

The performance of simple machine learning algorithms depends heavily on the representation of
the data they are given.

Each piece of information included in the representation is known as a feature.

Representation learning: Use machine learning to discover not only the mapping from
representation to output but also the representation itself.

- Learned representations often result in much better performance than can be obtained with
hand designed representations.

- Auto-encoder is the combination of an encoder function and a decoder function
When designing features or algorithms for learning features, our goal is usually to separate the
factors of variation that explain the observed data.

- Most applications require us to disentangle the factors of variation and discard the ones that
we do not care about.

Deep learning solves the central problem of obtaining representations in representation learning
by introducing representations that are expressed in terms of other, simpler representations.

- The quintessential example of a deep learning model is the feedforward deep network, or
multi later perceptron (MLP). A multilayer perceptron is just a mathematical function mapping
some set of input values to output values. The function is formed by composing many
simpler functions.

Visible layer: contains the variables that we are able to observe.

Hidden layers: extract increasingly abstract features.

- Values are not given in the data, instead the models must determine which concepts are
useful for explaining the relationships in the observed data.

For machine learning you have features x which are used to make predictions y.̂

Labels are what you want to predict.

Features are the variables you use to make the prediction. They make up the representation.

,The objective of regression: we want to predict a continuous output value (scalar), given an input
vector.

- ŷ = f (x; w)
- ŷ = prediction
- f = regression function
- x = input vector
- W = paramaters to learn

- Input is transformed using parameters
Linear regression:

- ŷ = f (x; w) = x T w
- T represents dot product, number of parameters == number of features
- We want the weighted sum of the parameters. This is done by taking the dot product of the
vectors.

Weights and biases:

- If the input is a vector of zeros x = [0,0,0… . ]T the output is always 0.
- To overcome this we add bias (also known as an intercept)
- X = [x,1]
- W = [w,b]
- So we always have one more parameter to learn.
- Bias is an extra parameter that we always get, it is the same for all datapoints.
Goodness of t: given a machine learning model, how good is it. We measure that and give it a
score.

- Typically measure the di erence between the ground truth and the prediction.

- Loss function: (yn − yn̂ )2
1
(yn − xnT w)2
- Learning objective (SSE):
2 ∑
- xnT w == yn̂
- The equation is squared to punish bigger mistakes/di erences

Linear regression forward and loss: parameters are needed to compute the loss, while the loss is
needed to know how well the parameters perform.

The best parameters W are the ones with the lowest sum of squared errors (SSE).

fi ff ff

, To nd the minimum SSE, we need to take the derivative of the SEE and set it to zero.

1
(yn − xnT w)2 becomes:
- s(w) =
2 ∑
d
(yn − xnT w)xn (derivative)
∑
- (s w) = −
dw
d
- We transform it to vecoterised form: s(w) = − (y − w T x)x T
dw
- Setting the derivative to 0 gives: −(y − w T x)x T = 0

- Solving this gives: w = (x x T )−1 * x y T

Linear regression can be solved in one equation. Unfortunately most machine learning models
cannot be solved this directly. Most problems have more than 1 (non-convex) minimum so then
the mathematical approach from before does not work.

Gradient descent:

- Slow iterative way to get the nearest minimum
- The gradient tells use the slope of a function
- Greedy approach
- Useful when non-convex
- Step by step guide:
1. Initialise parameters randomly

2. Take gradient and update parameters (keep taking new parameters and taking the
gradient until minimum is found)

3. Stop when at a minimum and can’t go lower. Meaning new step is not better than
previous step.

Regression is nothing more than nding those parameters that minimise our squared errors.

Parameters are values that we need to learn.

Hyper parameters are parameters that we would like to learn but unfortunately cannot learn, so
then we have to set them.

Learning rate (lambda) λ is an important hyper parameter.

Setting the step size in gradient descent:

- Too low, a small learning rate requires many updates before reaching the minimum point.
- Just right, the optimal learning rate swiftly reaches the minimum point
- Too high, too large learning rate causes drastic updates which lead to divergent behaviours
and overshooting the minimum.

fi fi

, Stochastic gradient descent:

- Go over subsets of examples, compute gradient for subset and update.
- Solves problem of going over all samples with gradient descent.
Linear regressions is a one layer network with:

- Forward propagation: compute ŷ = w T x
- Backward propagation: compute gradient of x
1 2
- Loss: square di erence (y − y)̂ , gradient (y − y)̂
2
Polynomial regression:

- New forward function: ŷ = w T x + w T (x 2) + . . . + w T (x n)
- The higher the value of n, the more non-linear the regression function.
You can solve over tting by adding more data, but this does require a lot of data.

Tackling over tting with Regularisation:

- Data point xn

- True value yn

- Predicted value ŷ = f (x n : w)
1
(yn − w T xn)2 + λ R(w)
2∑
Learning objective: min
-
n
- λ is a hyperparameter (learning rate)
2
- With R(w) = ∑ wd
d
- The lower the values of the weights the lower the error.
- Intuition: high weights are key factors in over tting
- Find a balance between t and complexity
- Using only R(w) would result in value 0 for w being the best option
- It involves adding a penalty term to the model's optimization objective, discouraging overly
complex models by penalizing large parameter values or high complexity.

fi ff fi fi fi

$8.54

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

celchik123

5.0

(1)

Get to know the seller

celchik123 Universiteit van Amsterdam

View profile

Sold

Member since

7 year

Number of followers

Documents

Last sold

6 months ago

5.0

1 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller celchik123. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.54. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 46634 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Samenvatting Applied Machine Learning

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?