Notas de lectura

College aantekeningen Machine Learning Week 1 - 5

Puntuación

Vendido

Páginas

Subido en

21-03-2021

Escrito en

2019/2020

College aantekeningen Machine Learning Week 1 - 5 (2 colleges per week)

Institución

Grado

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Vrije Universiteit Amsterdam (VU)
Estudio: Business Analytics
Grado: Machine Learning

Todos documentos para esta materia (4)

Información del documento

Subido en: 21 de marzo de 2021
Número de páginas: 20
Escrito en: 2019/2020
Tipo: Notas de lectura
Profesor(es): Peter bloem
Contiene: Week 1 - 5 (2 colleges per week)

Temas

machine learning
college aantekening

Vista previa del contenido

Machine Learning Week 1:

Hoorcollege 1 Introduction

Deductive reasoning: discrete, unambiguous, known rules
Inductive reasoning: ambiguous, experimental rules, unknown rules -> has to be broken
down to a few rules.
Machine learning: provides systems the ability to automatically learn and improve from
experience without being explicitly programmed (inside other software, in analytics/data
mining, science/statistics)
What makes a good ML problem?
● can’t solve it explicitly, approximate solutions are fine
● limited reliability, predictability, interpretability is fine
● plenty of examples to learn from
- e.g recommending a movie, predicting driving time
Abstract tasks generalize the problem of learning
Online learning: acting and learning at the same time
Reinforcement learning: online learning in a world based on delayed feedback
Offline learning: separate learning and acting - take a fixed dataset of examples, train a
model to learn from these examples, test the model to see if it works
Solve ML problems in general by using a few abstract tasks like classification, regression or
clustering.
1. Supervised: explicit examples of input and output, learn to predict the output for an
unseen input. (mailtjes naar spam of niet) labeled data
- Classification: assign a class to each example
Instances : data we provide our system (examples)
Features of the instance: things we measure
Target value: thing we are trying to learn
Dataset fed to learning algorithm -> produce a classifier: a small machine that
attempts to solve the learning problem.

Example: handwriting recognition, chess problem -> translate problems into
tasks.
Feature space: features plotted against each other
1. Linear classifier: draw a line, since a line is defined by two
parameters, we can also plot in a model space. (the two parameters of
the lines) We define a loss function which tells us how well a particular
model does for our data.
Loss function (model) = performance of model on the data. It takes the
model as input and the data as a constant
2. Decision Tree classifier: studies one feature in isolation at every node.
3. K-nearest neighbour: for a new point, it just looks at the k-points that
are closest and assigns the class that is most frequent in that set.
k = hyperparameter: you have to choose it yourself.

Variations:
Features usually numerical or categorical

, Binary classification: a task with two classes
Multiclass classification: more than two classes
Multilabel classification: none, some or all classes may be true (not
part of course)
Class probabilities/scores: the classifier reports a probability for each
class

- Regression: assign a number to each example. Predict a number instead of a
class. Output space and feature space in same model. We want to model the
relation between the features and the target
Loss function regression: 1/n Sum (f(xi) - yi)2 (MSE)
Regression tree: segment feature space into blocks and assign each a
number.
-> Overfitting (memorizing data instead of generalizing): training loss : the
loss for a given model on the training data. Never judge performance on
training data, leave some unseen data.

The aim of machine learning is not to minimise the loss on the training data,
but to minimise the loss on your test data. You don’t get to see the test data
until you’ve chosen your model!
The problem of machine learning is to choose a model that fits the pattern,
and ignores the noise.

2. Unsupervised: only inputs provided, find any pattern that explains something about
the data
- Clustering: ask the learner to split the instances into a number of clusters.
(unsupervised classification)
K-means: recursive nearest mean until the algorithm converges to a stable
state.
- Density Estimation: we want to learn how likely new data is. It thus produces
a number. The task of modelling the probability distribution behind your data.
- Generative Modeling: building a model from which you can sample new
examples is called generative modeling. -> deep learning

Hoorcollege 2: Linear Models 1

, Linear models and search: search methods are extremely versatile.
- Linear Regression: 1 feature f(x) = wx+b. If we have 2 features, each feature gets its
own weight. Multiple features, we arrange the features of a single instance into a
vector.
-> arbitrary number of features: dot product of the data and the weights and add a
bias. The weights and the bias are the parameters of the model.
Dot product!

Which model fits our data best?
1. Loss function: which tells us how well a particular choice of model does
2. Search method: search the space of all models for a particular model that
results in a low loss.
● Common Loss function for regression: MSE. Squares ensure that negative and
positive residuals don’t cancel out, but also that big errors affect the loss more
heavily than small errors.
-> the loss function maps every point in the model space to a loss value.
● Optimization: p^ = arg min loss X,Y (p) . We are trying to find the input (model
parameters) for which a particular function (the loss) is at its minimum. Find lowest
value as possible
1. Random search: Start with a random point p, loop: pick a point p’ close to p, if
loss p’ is smaller than loss p, p<p’. Analogy: hiker in a snowstorm!
It chooses the next point by sampling uniformly among all points with some
pre-chosen distance.
It works well because our problem is convex: if a line drawn between any two
points on the surface lies entirely above the surface. Implicates there is only 1
minimum.

-> What if the loss surface has multiple local minima? The random search will
go to one of the local minima and get stuck there.
Trick: simulated annealing: still pick the next point that isn’t better than the
current one, but only with a small probability. It still has some probability of
escaping.
In many situations local minima are fine.
● Variations in step size: fixed radius, random uniform or normally distributed.
● Space of linear models is continuous, if the space is discrete such as in tree models
you need to define which models are close to each other.
● Black Box Optimization: random search, simulated annealing. Very simple. We only
need to compute the loss function for each model. Can require many iterations. Also
works for discrete model spaces.
● Parallel Search: random search a couple of times independently.
- Introduce some form of communication between the searches to make it
more useful. Population Methods.
Evolutionary Algorithms: rank the population by loss, remove the worst half,
‘breed’ (crossover operator) a new population.
-> Slow + expensive for complex models, difficult to tune.
● Branching Search: the more closely we inspect our local neighbourhood, the faster
we converge.

$7.27

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

hannah85

3.0

(1)

Conoce al vendedor

hannah85 Vrije Universiteit Amsterdam

Ver perfil

Seguir

Vendido

Miembro desde

6 año

Número de seguidores

Documentos

Última venta

3 año hace

3.0

1 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller hannah85. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $7.27. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now