Samenvatting

Patter Recognition Summary & Course Notes

Beoordeling

Verkocht

Pagina's

Geüpload op

22-11-2022

Geschreven in

2022/2023

This document contains notes and summaries covering the content of the course Pattern Recognition within the Artificial Intelligence Master at Utrecht University. It covers the following topics: - LMs for regression - LMs for classification 1&2 - intro to DL and NN - CNNs - RNNs

Meer zien Lees minder

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Universiteit Utrecht (UU)
Studie: Master Artificial Intelligence
Vak: Pattern Recognition

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 22 november 2022
Aantal pagina's: 16
Geschreven in: 2022/2023
Type: Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Pattern Recognition Class Notes

PR CLASS 1 - LINEAR MODELS FOR REGRESSION

– what’s statistical pattern recognition --> field of PR ML concerned
with automatic discovery or regularities in data thru use of computer
algorithms and with use of these regularities to take actions such as
classifying data into categories
– ML approach to do this --> use set of training data of n labeled
examples and fit a model to this data, the model can subsequently be
used to predict class for new input vectors about a new data point,
ability to categorise correctly is called generalization
– supervised: numeric target (regression), discrete unordered target
(classification), discrete ordered target (ordinal classification, ranking)
– unsupervised learning: clustering, density estimation
– always in data analysis --> DATA = STRUCTURE + NOISE --> we want
to capture the structure but not the noise
– coefficients (weights) of a liner and nonlinear function are learned or
estimated from the data
– maximum likelihood estimation: --> probability of observing x events
(successes) for y trials, get N independent observations from the
same probability distribution, find that particular value such that the
observed data are more likely to have come from the probability we
actually sampled from, how likely that future observation is similar to
observations drawn from general distribution
– score = quality of fit = complexity
– decision theory --> when wanting to make a decision in a situation
involving uncertainty, then two steps:
– 1) inference: learn p(x,t) from data (main subject of this course)
– 2) decision: given the estimate of p(x,t), determine optimal
decision (not focus on this in this course)
– suppose we know the joint probability distribution p(x,t):
– the task is --> given value for x, predict class labels t
– Lkj is the loss of predicting class j when the true class is k, K is the
number of classes
– then, to minimize the expected loss, predict the class j that
minimizes the function at page 37 of lecture 1 slides
– useful properties of expectation and variance:
– E[c] = c for constant c.
– E[cx] = cE[x].
– E[x ± y ] = E[x] ± E[y ].
– var[c] = 0 for constant c.
– var[cx] = c2var[x].
– var[x ± y ] = var[x] + var[y ] if x and y independent

, – if E (expected value), c is a constant and x a variable:
– 1) E(c) = c
– 2) E(cx) = cE(x)
– 3) E(x +- y) = E(x) +- E(y)
– to minimize the expected squared error we should predict the mean of
t at point t0
– generally y(x) = Et(t|x) --> population regression function
– simple approach to regression:
– predict the mean of the target values of all training observations
with x = x0
– problem here is that given the numerical nature of the x value, it
wouldn’t really be in practice
– possible solution --> look at data points that are close by to where
i wanna make the prediction, then take the mean of those train
points
– idea here is KNN for regression, ex. with in variable x ad out
variable t:
– 1) for each x define a neighborhood Nk(x) containing the
indices n of the k points (xn, tx)
– with k = small value (e.g. 2), line is very wiggly, but with k =
larger value (e.g. 20), line much smoother because we are
averaging on more k nearest neighbors
– k is a complexity hyperparameter for KNN
– for a regression problem, the best predictore for out variable is the
mean of the out variable for a given input x, but this not applicate with
finite data sample, so:
– NN function approximates mean by averaging over training data
– then NN function relaxes conditioning at a specific input

passphrase for ssh key: Umga!20RuspaSape21?mgaU

PR CLASS 2 - LINEAR MODELS FOR CLASSIFICATION 1

– assumption in linear regression model is that the expected value of t is
a linear function of x
– have to estimate slope and intercept from data
– the observed values of t are equal to the linear function x + some
random error
– given a set of train points (N), we have N pairs of input variables x and
target variable t
– find values for w0 and w1 that minimized the sum of squared errors
– average value is denoted with a bar little dash above the variable term

, – r2 (coefficient of determination):
– metrics for regression prediction evaluation
– define three terms:
– 1) total sum of squares —> is total variation of t in the sample
data
– 2) explained sum of squares —> is the part explained by the
regression, the variation in t explained by regression function
– 3) error sum of squares, part explained by the regression —> is
the sum over all observation to find the difference between
actual value of t and predictions
– therefore we have SST = SSR (part explained by regression) + SSE
– Rsquared = proportion of variation in t explained by x:
– r2 = SSR/SST = 1 - SSE / SST
– output is a number between 1 and 0
– if = 1, all data points are on regression line
– If = 0, they as far away as possible
– the above is the Calculus based solution to this
– but let's check the geometrical approach:
– suppose pop regression line goes through origin, meaning that w0
intercept is = 0, passes thru origin on y axis
– then follow same approach, result is same just regression error is
different
– error vector should be taken perpendicular to the vector of x
meaning that the dot product of x and error vector should be 0
– in least squares solution:
– y is a linear combination of the columns of X
– y = Xw
– want the error vector to be as small as possible
– the observed target values are not a linear combination of the
predictor variables --> if that would be the case, we'd get a perfect
regression line (all the points on it), which is just never the case with
real data
– general linear model:
– the term linear means that it has to be a linear function of the
coefficients/weights, but not necessarily of the input variables
– is linear in the features, but not necessarily in the input variables
that generate them
– interaction between two features modifies the effect of these two
variables on the target variable (e.g. age and income on pizza
expenditure)
– regularized least squares:
– add a regularization term to control for overfitting, in those cases
where the number of predictor variables is very large
– here use ridge regression (weight decay in NN)
– penalize the size of the weights

€10,99

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

massimilianogarzoni

2,7

(3)

Maak kennis met de verkoper

massimilianogarzoni Universiteit Utrecht

Bekijk profiel

Volgen

Verkocht

Lid sinds

8 jaar

Aantal volgers

Documenten

Laatst verkocht

5 maanden geleden

2,7

3 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper massimilianogarzoni. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €10,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 52191 samenvattingen verkocht Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Patter Recognition Summary & Course Notes

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Universiteit Utrecht (UU) > Master Artificial Intelligence

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?