100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Class notes

Lecture notes Introduction to Machine Learning (8BB020)

Rating
-
Sold
-
Pages
42
Uploaded on
31-10-2025
Written in
2024/2025

This document contains everything you need to know for each lecture before the exam. Super handy to review.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
October 31, 2025
Number of pages
42
Written in
2024/2025
Type
Class notes
Professor(s)
Dr. federica eduati
Contains
All classes

Subjects

Content preview

College 1

Unsupervised machine learning:
- given a dataset xi, find some interesting properties. (clustering, density
estimation, generative models)
- a type of machine learning that learns from data without human
supervision.
- Unsupervised machine learning models are given unlabeled data and
allowed to discover patterns and insights without any explicit guidance or
instruction.

Supervised machine learning (most common):
- given a training dataset {xi, yi}, predict ˆyi of previously unseen
samples. (regression -> yi is continuous, classification -> yi is categorical)
- a category of machine learning that uses labeled datasets to train
algorithms to predict outcomes and recognize patterns.
- supervised learning algorithms are given labeled training to learn the
relationship between the input and the outputs.


Notations:

- Y = outcome measurement (dependent variable/response/target)
In regression Y is quantitative (e.g. price, blood pressure)
In classification Y takes values in a finite. This means it can come out
as large unordered sets. (survived/died, cancer class of tissue
sample)
For both we need a training data set: these are pairs of observations.
- X = vector of p predictor measurements
(inputs/regressors/covariates/features/independent variables)
- Machine learning aims to ’learn’ a model f that predicts the outcome
Y given the input X:

Y = f (X) + ϵ

Epsilon captures measurement errors and other disruption.

The main aim of machine learning is that we want to create a formula f
that is applicable to different situations. On the basis of the training data
we would like to:

- Accurately predict unseen test cases
- Understand which input affects the outcome and how
- Assess the quality of our predictions

,Classification:

- Feature space is the space where you plot the data
points in.
- k-Nearest neighbours classifier algorithm:
supervised learning classifier, which classifies or
predicts the output for a given input based on the
its closest neighbours in the feature space.
- The number of k can be changed, to achieve a
better boundary:
How to pick the k-value:
Give more data to the model

^y = y hat = prediction notation

Formalized algorithm k-NN:

- Xnew = [x0,x1] zijn nieuwe features waarvoor je de klasse ^ynew
wilt voorspellen
- Bereken de afstand tussen xnew en de bestaande punten in je
trainingsdataset.
d(xnew, xi) = sqrt((xnew,0 – xi,0)2 + (xnew,1 – xi,1)2) (using the Euclidean
distance)
- Sort the samples based on the distance and pick the k nearest ones
to the new example.
- Determine the class of the k nearest training samples.
- Assign to xnew the majority class of its nearest training samples
(neighbours).

The k-NN can be extended

- Using it for more than two classes
- Using k-NN for regression is also possible (instead of computing the
majority class of the nearest neighbours, we compute the average
target value y).
- Use different distance metric is common: for example the L1-
distance instead of the Euclidean distance. d(xnew, xi) = |xnew,0 −
xi,0| + |xnew,1 − xi,1|

Choosing the k value:

We need to choose k based on the performance
on an independent test set -> no examples
should be related to the ones in the training set.

Compute the error rate for a test set and training
set, and determine a good k by looking at both
graphs and search the lowest error rate.

,The error on the independent test dataset is called the generalization
error: it tells us how well we can expect our classifier to generalize its
performance on new, unseen examples.

- Classifiers that produce simple decision boundaries can have higher
training errors but usually generalize better to new samples.
- Classifiers that produce complex decision boundaries can have lower
training errors but usually generalize worse to new samples.
- Complexity decreases with k getting bigger:
Kleine waarden van k -> ruis kan een grote invloed hebben ->
complex model -> overfitting -> low error -> slechte prestatie op
nieuwe data, omdat het de trainingset heel goed kent.
Grote waarden van k -> ruis heeft kleine invloed -> minder complex
model -> underfitting -> high error -> te veel generalisatie en mist
belangrijke patronen.




Parametric models:

- The number of parameters is fixed.
- Once the model is trained (parameters are determined), we can
throw away the training dataset.
- Linear regression is an example.

Non-parametric models:

- The number of parameters is not fixed, it grows with the number of
training samples.
- K-NN is an example of non-parametric machine learning model.

Lecture 2

11-9-2024


Linear models for regression and
classification
In general:

,  Fundamentals
 Model interpretation
 Estimation
 Model evaluation

All in lectures = exam  look at the book for different interpretations

Linear models = combination of inputs (predictors, features or
independent variables) to predict the output

Regression = output is quantitative (continuous variable)

Classification = cateragories (binary variable (can be multiclassed))



When to use a linear model?  look at the complexity of the cohesion
between the variables and output.


Least squares method




Y on y axis

X on x axis

You want to find the red line
$8.35
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
esmeevanbogget

Get to know the seller

Seller avatar
esmeevanbogget Technische Universiteit Eindhoven
Follow You need to be logged in order to follow users or courses
Sold
New on Stuvia
Member since
2 months
Number of followers
0
Documents
3
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions