Notes de cours

Machine learning 2 Samenvatting/College aantekeningen Midterm

Note

Vendu

Pages

Publié le

12-09-2024

Écrit en

2023/2024

In dit document staat per college alle informatie die ik heb verzameld (incl. tekeningen en cuts uit de slides) voor de midterm van Machine Learning 2.

Établissement

Cours

Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

Signaler une violation de copyright

École, étude et sujet

Établissement: Universiteit Utrecht (UU)
Cours: Kunstmatige Intelligentie
Cours: Machine Learning 2 (INFOB3ML)

Tous les documents sur ce sujet (2)

Infos sur le Document

Publié le: 12 septembre 2024
Nombre de pages: 11
Écrit en: 2023/2024
Type: Notes de cours
Professeur(s): Heysem kaya & meaghan fowlie
Contenu: Toutes les classes

Sujets

bayes theorem
supervised learning
unsupervised learning
marginal likelihood
laplace approximation
regularisation
svm
regression
maximize likelihood
minimize loss
ml
ld

Aperçu du contenu

Lecture 1 oklog123

↑
prerequisites test Remindo
log (ab) log logb
=
a +

expla+ b) =

expa
.

expb exp(a b) .
=
expab
Regression recap: hypothesis = true function
Supervised learning where each datapoint is of the form x, t (t ∈ R) and we look for a hypothesis s.t. t ≈ f(x)
T
Linear regression: we look for a hypothesis s.t. t ≈ x * w
X matrix allows us to fit polynomials of degree up to K —> For N datapoints we have N rows, each column is a feature
Learned from bias-variance analysis/VC-dimension: smaller hypothesis class may mean better generalisation
performance
Overfitting: algorithm is allowed to pick too complex hypotheses (fits random noise too well at expense of fitting true
function underlying the data) continuous spectrum
of hypotheses
Another way to avoid too complex hypotheses: Regularisation (soft constraint) from simple to complex -

Instead of finding the weight vector w that minimises squared error: Loss function 2 ((Xw -t)T(Xw t) = -

We’ll find the one minimising the Penalised loss 2 2 + XwTw penalty =

—> If fitting the data requires large weights, algorithm can pick them, as long as the increase in penalty is offset by
enough reduction in loss (lambda λ is used to control trade-off between penalty and loss)
K-fold cross validation to find a good trade-off
We want to validate each value of λ on each of the K folds, and average those K results for each λ
Finding the optimal regularised w: Take partial derivative with respect to w of the penalised loss formula, set
expression to zero and solve for w —> w (X X NXI)" XTt = regularised least squares solution
=
+
+

X = [] Ex =

= 1
,
x =

polynomial degrees (feature 1)
Xi = X ....... X first column ;
only is

2 / ((Xw t)T(Xw t)
=
-
- + xwiw
2 : X Xw-
*
YNWX - + *
XWiw
+
wr + w =
2 = 2X Xw -
YwX
+
t + 2xw
*
set to zero+X Xw- Y X +
+ + 2xw = 0

w(2/NXTX 2x1) 2X t
+
-
=

*
w (2 X + X- 2x1)" Ex E
=
:

(X +X NXl)" X t
+

? w = - .

(XTX NX1) X t
+
+ =

w = (XTX + NX1) "XTt

, Lecture 2 07 109123
A different way to look at linear regression:
1) zoom in on how the data might be generated; looking at the probability distribution the data may be drawn from
Reason backward (generated data) to the true function we want to figure out
We determine the distribution, and if our model is close enough to reality, it may be useful (do realise that noise plays a
role in prediction)
Goal: learn how to predict a good t, when given x —> Focus on conditional distribution p(t1,…,tN | X) for N points
throw
Probability distributions: 5X
/(0-5)
:

P = probability, p = density & P(Y = y)
event : random variable

Property of a PDF is that it’s continuous
value
X

neads
= 2 means ; 2/5 throws landed heads

density >
-

probability

Mean: true function’s value for t at x ( P(T = 10.25 | X = 1980) = 0 )
Variance: usually unknown, oh
p (t Ez en) p(z) p(tz) p(tn) P(En) =
= ·
....
,,

Probabilistic independence:
...

,

Dependent random variables; x, y depend on each other (knowing value of x gives info on y)
Independent random variables; we look at PDF of x and y separately -> p(x , y) = p(x)p(y)
Dependent variables are necessary for us to be able to learn anything from training data points about new data
Independent noise: the noise terms ε: tn f(Xn) + En where f is the true function (randomly sample x, compute t, add noise)
=

Information in tn that’s relevant for predicting other t’s should be captured in f
The info in εn should be irrelevant for predicting other t’s —> noise terms are independent
! Conditional independence (x conditionally independent of y, given z: p(x, y | z) = p(x|z)p(y|z)
Conditional independence between the t’s, given f, σ2, and X allows us to write
-> and we decided that the distribution should be Gaussian with mean f(xn) and variance σ2
or Xn)
Ctrly , ,

During regression we have data (x & t) but don’t know f or σ2 —> we look for the f for which our data would have been
most likely

We look at a likelihood function L as a function of f and σ2 while we hold data fixed
Note: in linear regression we’re not looking for an arbitrary function f, but one that can be described by
a weigtht vector w s.t. f(x) = xT * w
Expression likelihood for single data point: L exp)-1/202 (tn -x w(2)
+

To express the likelihood we use a formula, which we can simplify by taking the logarithm (big product->big sum)
*log is monotonically increasing, so paramaters w and σ2 that maximise L will also maximise log L
! To maximize likelihood we take the derivative of log L, set it to 0 and get: = (XiX) "XIE ,which is the same w that
minimized squared loss
We can also find the max likelihood of σ2 by setting the log formula to zero with respect to o
! Solution: 2 = /Netn-Mw)2
which measures avg squared deviation of tn from its mean (analogous to def of variance)
The larger the difference between predictions and data, the larger σ2 gets

To know for sure that the calculated ‘derivative set to zero’ is a maximum we can check that the 2nd derivative is
negative (check slide)
For functions of vectors, we need the Hessian (matrix of second partial derivatives) to be negative definite (slide 21!!)
This means all eigenvalues need to be negative
Hessian of the likelihood w/ respect to w is 20 XX
*
ziX Xz0
-
-

We need to check that for all z = 0
2X'X20
-

So each square is > 0, so the sum is also > 0 and only 0 if all squares are 0
1
(Xz)"
N
X2 > 0
So, only in rare cases, our w is indeed the weight vector that maximizes likelihood
(x2)n0
likelihood
Maximizing minimizing regularized least
squares solution
find parameter values that observed data most probable find
make
parameter values that minize [cerrors' between predicted/observed
values constant to avoid
overfitting
+
reg .

$10.83

Accéder à l'intégralité du document:

Garantie de satisfaction à 100%

Disponible immédiatement après paiement

En ligne et en PDF

Tu n'es attaché à rien

Faites connaissance avec le vendeur

Alysa3

3.0

(1)

Document également disponible en groupe

Faites connaissance avec le vendeur

Alysa3 Universiteit Utrecht

Voir profil

Vendu

Membre depuis

2 année

Nombre de followers

Documents

Dernière vente

2 mois de cela

3.0

1 revues

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur Alysa3. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour $10.83. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis) 40945 résumés ont été vendus ces 30 derniers jours Fondée en 2010, la référence pour acheter des résumés depuis déjà 16 ans