College aantekeningen

College aantekeningen Introductie Machine Learning (8BB020)

Beoordeling

Verkocht

Pagina's

Geüpload op

31-10-2025

Geschreven in

2024/2025

Dit document staat alles in per college wat je moet weten voor het tentamen. Super handig om door te nemen.

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Technische Universiteit Eindhoven (TUE)
Studie: Biomedische Technologie
Vak: Introductie Machine Learning (8BB020)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 31 oktober 2025
Aantal pagina's: 42
Geschreven in: 2024/2025
Type: College aantekeningen
Docent(en): Dr. federica eduati
Bevat: Alle colleges

Onderwerpen

classification
k nn
overfitting
underfitting
parametric model
non parametric model
least squares method
gradient descent method
unsupervised machine learning
supervised machine learning

Voorbeeld van de inhoud

College 1

Unsupervised machine learning:
- given a dataset xi, find some interesting properties. (clustering, density
estimation, generative models)
- a type of machine learning that learns from data without human
supervision.
- Unsupervised machine learning models are given unlabeled data and
allowed to discover patterns and insights without any explicit guidance or
instruction.

Supervised machine learning (most common):
- given a training dataset {xi, yi}, predict ˆyi of previously unseen
samples. (regression -> yi is continuous, classification -> yi is categorical)
- a category of machine learning that uses labeled datasets to train
algorithms to predict outcomes and recognize patterns.
- supervised learning algorithms are given labeled training to learn the
relationship between the input and the outputs.

Notations:

- Y = outcome measurement (dependent variable/response/target)
In regression Y is quantitative (e.g. price, blood pressure)
In classification Y takes values in a finite. This means it can come out
as large unordered sets. (survived/died, cancer class of tissue
sample)
For both we need a training data set: these are pairs of observations.
- X = vector of p predictor measurements
(inputs/regressors/covariates/features/independent variables)
- Machine learning aims to ’learn’ a model f that predicts the outcome
Y given the input X:

Y = f (X) + ϵ

Epsilon captures measurement errors and other disruption.

The main aim of machine learning is that we want to create a formula f
that is applicable to different situations. On the basis of the training data
we would like to:

- Accurately predict unseen test cases
- Understand which input affects the outcome and how
- Assess the quality of our predictions

,Classification:

- Feature space is the space where you plot the data
points in.
- k-Nearest neighbours classifier algorithm:
supervised learning classifier, which classifies or
predicts the output for a given input based on the
its closest neighbours in the feature space.
- The number of k can be changed, to achieve a
better boundary:
How to pick the k-value:
Give more data to the model

^y = y hat = prediction notation

Formalized algorithm k-NN:

- Xnew = [x0,x1] zijn nieuwe features waarvoor je de klasse ^ynew
wilt voorspellen
- Bereken de afstand tussen xnew en de bestaande punten in je
trainingsdataset.
d(xnew, xi) = sqrt((xnew,0 – xi,0)2 + (xnew,1 – xi,1)2) (using the Euclidean
distance)
- Sort the samples based on the distance and pick the k nearest ones
to the new example.
- Determine the class of the k nearest training samples.
- Assign to xnew the majority class of its nearest training samples
(neighbours).

The k-NN can be extended

- Using it for more than two classes
- Using k-NN for regression is also possible (instead of computing the
majority class of the nearest neighbours, we compute the average
target value y).
- Use different distance metric is common: for example the L1-
distance instead of the Euclidean distance. d(xnew, xi) = |xnew,0 −
xi,0| + |xnew,1 − xi,1|

Choosing the k value:

We need to choose k based on the performance
on an independent test set -> no examples
should be related to the ones in the training set.

Compute the error rate for a test set and training
set, and determine a good k by looking at both
graphs and search the lowest error rate.

,The error on the independent test dataset is called the generalization
error: it tells us how well we can expect our classifier to generalize its
performance on new, unseen examples.

- Classifiers that produce simple decision boundaries can have higher
training errors but usually generalize better to new samples.
- Classifiers that produce complex decision boundaries can have lower
training errors but usually generalize worse to new samples.
- Complexity decreases with k getting bigger:
Kleine waarden van k -> ruis kan een grote invloed hebben ->
complex model -> overfitting -> low error -> slechte prestatie op
nieuwe data, omdat het de trainingset heel goed kent.
Grote waarden van k -> ruis heeft kleine invloed -> minder complex
model -> underfitting -> high error -> te veel generalisatie en mist
belangrijke patronen.

Parametric models:

- The number of parameters is fixed.
- Once the model is trained (parameters are determined), we can
throw away the training dataset.
- Linear regression is an example.

Non-parametric models:

- The number of parameters is not fixed, it grows with the number of
training samples.
- K-NN is an example of non-parametric machine learning model.

Lecture 2

11-9-2024

Linear models for regression and
classification
In general:

,  Fundamentals
 Model interpretation
 Estimation
 Model evaluation

All in lectures = exam  look at the book for different interpretations

Linear models = combination of inputs (predictors, features or
independent variables) to predict the output

Regression = output is quantitative (continuous variable)

Classification = cateragories (binary variable (can be multiclassed))

When to use a linear model?  look at the complexity of the cohesion
between the variables and output.

Least squares method

Y on y axis

X on x axis

You want to find the red line

€6,80

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

esmeevanbogget

Maak kennis met de verkoper

esmeevanbogget Technische Universiteit Eindhoven

Bekijk profiel

Volgen

Verkocht

Nieuw op Stuvia

Lid sinds

2 maanden

Aantal volgers

Documenten

Laatst verkocht

0,0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper esmeevanbogget. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,80. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 56626 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

College aantekeningen Introductie Machine Learning (8BB020)

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Technische Universiteit Eindhoven (TUE) > Biomedische Technologie

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?