100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
College aantekeningen

College aantekeningen Introductie Machine Learning (8BB020)

Beoordeling
-
Verkocht
-
Pagina's
42
Geüpload op
31-10-2025
Geschreven in
2024/2025

Dit document staat alles in per college wat je moet weten voor het tentamen. Super handig om door te nemen.












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
31 oktober 2025
Aantal pagina's
42
Geschreven in
2024/2025
Type
College aantekeningen
Docent(en)
Dr. federica eduati
Bevat
Alle colleges

Onderwerpen

Voorbeeld van de inhoud

College 1

Unsupervised machine learning:
- given a dataset xi, find some interesting properties. (clustering, density
estimation, generative models)
- a type of machine learning that learns from data without human
supervision.
- Unsupervised machine learning models are given unlabeled data and
allowed to discover patterns and insights without any explicit guidance or
instruction.

Supervised machine learning (most common):
- given a training dataset {xi, yi}, predict ˆyi of previously unseen
samples. (regression -> yi is continuous, classification -> yi is categorical)
- a category of machine learning that uses labeled datasets to train
algorithms to predict outcomes and recognize patterns.
- supervised learning algorithms are given labeled training to learn the
relationship between the input and the outputs.


Notations:

- Y = outcome measurement (dependent variable/response/target)
In regression Y is quantitative (e.g. price, blood pressure)
In classification Y takes values in a finite. This means it can come out
as large unordered sets. (survived/died, cancer class of tissue
sample)
For both we need a training data set: these are pairs of observations.
- X = vector of p predictor measurements
(inputs/regressors/covariates/features/independent variables)
- Machine learning aims to ’learn’ a model f that predicts the outcome
Y given the input X:

Y = f (X) + ϵ

Epsilon captures measurement errors and other disruption.

The main aim of machine learning is that we want to create a formula f
that is applicable to different situations. On the basis of the training data
we would like to:

- Accurately predict unseen test cases
- Understand which input affects the outcome and how
- Assess the quality of our predictions

,Classification:

- Feature space is the space where you plot the data
points in.
- k-Nearest neighbours classifier algorithm:
supervised learning classifier, which classifies or
predicts the output for a given input based on the
its closest neighbours in the feature space.
- The number of k can be changed, to achieve a
better boundary:
How to pick the k-value:
Give more data to the model

^y = y hat = prediction notation

Formalized algorithm k-NN:

- Xnew = [x0,x1] zijn nieuwe features waarvoor je de klasse ^ynew
wilt voorspellen
- Bereken de afstand tussen xnew en de bestaande punten in je
trainingsdataset.
d(xnew, xi) = sqrt((xnew,0 – xi,0)2 + (xnew,1 – xi,1)2) (using the Euclidean
distance)
- Sort the samples based on the distance and pick the k nearest ones
to the new example.
- Determine the class of the k nearest training samples.
- Assign to xnew the majority class of its nearest training samples
(neighbours).

The k-NN can be extended

- Using it for more than two classes
- Using k-NN for regression is also possible (instead of computing the
majority class of the nearest neighbours, we compute the average
target value y).
- Use different distance metric is common: for example the L1-
distance instead of the Euclidean distance. d(xnew, xi) = |xnew,0 −
xi,0| + |xnew,1 − xi,1|

Choosing the k value:

We need to choose k based on the performance
on an independent test set -> no examples
should be related to the ones in the training set.

Compute the error rate for a test set and training
set, and determine a good k by looking at both
graphs and search the lowest error rate.

,The error on the independent test dataset is called the generalization
error: it tells us how well we can expect our classifier to generalize its
performance on new, unseen examples.

- Classifiers that produce simple decision boundaries can have higher
training errors but usually generalize better to new samples.
- Classifiers that produce complex decision boundaries can have lower
training errors but usually generalize worse to new samples.
- Complexity decreases with k getting bigger:
Kleine waarden van k -> ruis kan een grote invloed hebben ->
complex model -> overfitting -> low error -> slechte prestatie op
nieuwe data, omdat het de trainingset heel goed kent.
Grote waarden van k -> ruis heeft kleine invloed -> minder complex
model -> underfitting -> high error -> te veel generalisatie en mist
belangrijke patronen.




Parametric models:

- The number of parameters is fixed.
- Once the model is trained (parameters are determined), we can
throw away the training dataset.
- Linear regression is an example.

Non-parametric models:

- The number of parameters is not fixed, it grows with the number of
training samples.
- K-NN is an example of non-parametric machine learning model.

Lecture 2

11-9-2024


Linear models for regression and
classification
In general:

,  Fundamentals
 Model interpretation
 Estimation
 Model evaluation

All in lectures = exam  look at the book for different interpretations

Linear models = combination of inputs (predictors, features or
independent variables) to predict the output

Regression = output is quantitative (continuous variable)

Classification = cateragories (binary variable (can be multiclassed))



When to use a linear model?  look at the complexity of the cohesion
between the variables and output.


Least squares method




Y on y axis

X on x axis

You want to find the red line
€6,80
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
esmeevanbogget

Maak kennis met de verkoper

Seller avatar
esmeevanbogget Technische Universiteit Eindhoven
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
Nieuw op Stuvia
Lid sinds
2 maanden
Aantal volgers
0
Documenten
3
Laatst verkocht
-

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen