College 1
Unsupervised machine learning:
- given a dataset xi, find some interesting properties. (clustering, density
estimation, generative models)
- a type of machine learning that learns from data without human
supervision.
- Unsupervised machine learning models are given unlabeled data and
allowed to discover patterns and insights without any explicit guidance or
instruction.
Supervised machine learning (most common):
- given a training dataset {xi, yi}, predict ˆyi of previously unseen
samples. (regression -> yi is continuous, classification -> yi is categorical)
- a category of machine learning that uses labeled datasets to train
algorithms to predict outcomes and recognize patterns.
- supervised learning algorithms are given labeled training to learn the
relationship between the input and the outputs.
Notations:
- Y = outcome measurement (dependent variable/response/target)
In regression Y is quantitative (e.g. price, blood pressure)
In classification Y takes values in a finite. This means it can come out
as large unordered sets. (survived/died, cancer class of tissue
sample)
For both we need a training data set: these are pairs of observations.
- X = vector of p predictor measurements
(inputs/regressors/covariates/features/independent variables)
- Machine learning aims to ’learn’ a model f that predicts the outcome
Y given the input X:
Y = f (X) + ϵ
Epsilon captures measurement errors and other disruption.
The main aim of machine learning is that we want to create a formula f
that is applicable to different situations. On the basis of the training data
we would like to:
- Accurately predict unseen test cases
- Understand which input affects the outcome and how
- Assess the quality of our predictions
,Classification:
- Feature space is the space where you plot the data
points in.
- k-Nearest neighbours classifier algorithm:
supervised learning classifier, which classifies or
predicts the output for a given input based on the
its closest neighbours in the feature space.
- The number of k can be changed, to achieve a
better boundary:
How to pick the k-value:
Give more data to the model
^y = y hat = prediction notation
Formalized algorithm k-NN:
- Xnew = [x0,x1] zijn nieuwe features waarvoor je de klasse ^ynew
wilt voorspellen
- Bereken de afstand tussen xnew en de bestaande punten in je
trainingsdataset.
d(xnew, xi) = sqrt((xnew,0 – xi,0)2 + (xnew,1 – xi,1)2) (using the Euclidean
distance)
- Sort the samples based on the distance and pick the k nearest ones
to the new example.
- Determine the class of the k nearest training samples.
- Assign to xnew the majority class of its nearest training samples
(neighbours).
The k-NN can be extended
- Using it for more than two classes
- Using k-NN for regression is also possible (instead of computing the
majority class of the nearest neighbours, we compute the average
target value y).
- Use different distance metric is common: for example the L1-
distance instead of the Euclidean distance. d(xnew, xi) = |xnew,0 −
xi,0| + |xnew,1 − xi,1|
Choosing the k value:
We need to choose k based on the performance
on an independent test set -> no examples
should be related to the ones in the training set.
Compute the error rate for a test set and training
set, and determine a good k by looking at both
graphs and search the lowest error rate.
,The error on the independent test dataset is called the generalization
error: it tells us how well we can expect our classifier to generalize its
performance on new, unseen examples.
- Classifiers that produce simple decision boundaries can have higher
training errors but usually generalize better to new samples.
- Classifiers that produce complex decision boundaries can have lower
training errors but usually generalize worse to new samples.
- Complexity decreases with k getting bigger:
Kleine waarden van k -> ruis kan een grote invloed hebben ->
complex model -> overfitting -> low error -> slechte prestatie op
nieuwe data, omdat het de trainingset heel goed kent.
Grote waarden van k -> ruis heeft kleine invloed -> minder complex
model -> underfitting -> high error -> te veel generalisatie en mist
belangrijke patronen.
Parametric models:
- The number of parameters is fixed.
- Once the model is trained (parameters are determined), we can
throw away the training dataset.
- Linear regression is an example.
Non-parametric models:
- The number of parameters is not fixed, it grows with the number of
training samples.
- K-NN is an example of non-parametric machine learning model.
Lecture 2
11-9-2024
Linear models for regression and
classification
In general:
, Fundamentals
Model interpretation
Estimation
Model evaluation
All in lectures = exam look at the book for different interpretations
Linear models = combination of inputs (predictors, features or
independent variables) to predict the output
Regression = output is quantitative (continuous variable)
Classification = cateragories (binary variable (can be multiclassed))
When to use a linear model? look at the complexity of the cohesion
between the variables and output.
Least squares method
Y on y axis
X on x axis
You want to find the red line
Unsupervised machine learning:
- given a dataset xi, find some interesting properties. (clustering, density
estimation, generative models)
- a type of machine learning that learns from data without human
supervision.
- Unsupervised machine learning models are given unlabeled data and
allowed to discover patterns and insights without any explicit guidance or
instruction.
Supervised machine learning (most common):
- given a training dataset {xi, yi}, predict ˆyi of previously unseen
samples. (regression -> yi is continuous, classification -> yi is categorical)
- a category of machine learning that uses labeled datasets to train
algorithms to predict outcomes and recognize patterns.
- supervised learning algorithms are given labeled training to learn the
relationship between the input and the outputs.
Notations:
- Y = outcome measurement (dependent variable/response/target)
In regression Y is quantitative (e.g. price, blood pressure)
In classification Y takes values in a finite. This means it can come out
as large unordered sets. (survived/died, cancer class of tissue
sample)
For both we need a training data set: these are pairs of observations.
- X = vector of p predictor measurements
(inputs/regressors/covariates/features/independent variables)
- Machine learning aims to ’learn’ a model f that predicts the outcome
Y given the input X:
Y = f (X) + ϵ
Epsilon captures measurement errors and other disruption.
The main aim of machine learning is that we want to create a formula f
that is applicable to different situations. On the basis of the training data
we would like to:
- Accurately predict unseen test cases
- Understand which input affects the outcome and how
- Assess the quality of our predictions
,Classification:
- Feature space is the space where you plot the data
points in.
- k-Nearest neighbours classifier algorithm:
supervised learning classifier, which classifies or
predicts the output for a given input based on the
its closest neighbours in the feature space.
- The number of k can be changed, to achieve a
better boundary:
How to pick the k-value:
Give more data to the model
^y = y hat = prediction notation
Formalized algorithm k-NN:
- Xnew = [x0,x1] zijn nieuwe features waarvoor je de klasse ^ynew
wilt voorspellen
- Bereken de afstand tussen xnew en de bestaande punten in je
trainingsdataset.
d(xnew, xi) = sqrt((xnew,0 – xi,0)2 + (xnew,1 – xi,1)2) (using the Euclidean
distance)
- Sort the samples based on the distance and pick the k nearest ones
to the new example.
- Determine the class of the k nearest training samples.
- Assign to xnew the majority class of its nearest training samples
(neighbours).
The k-NN can be extended
- Using it for more than two classes
- Using k-NN for regression is also possible (instead of computing the
majority class of the nearest neighbours, we compute the average
target value y).
- Use different distance metric is common: for example the L1-
distance instead of the Euclidean distance. d(xnew, xi) = |xnew,0 −
xi,0| + |xnew,1 − xi,1|
Choosing the k value:
We need to choose k based on the performance
on an independent test set -> no examples
should be related to the ones in the training set.
Compute the error rate for a test set and training
set, and determine a good k by looking at both
graphs and search the lowest error rate.
,The error on the independent test dataset is called the generalization
error: it tells us how well we can expect our classifier to generalize its
performance on new, unseen examples.
- Classifiers that produce simple decision boundaries can have higher
training errors but usually generalize better to new samples.
- Classifiers that produce complex decision boundaries can have lower
training errors but usually generalize worse to new samples.
- Complexity decreases with k getting bigger:
Kleine waarden van k -> ruis kan een grote invloed hebben ->
complex model -> overfitting -> low error -> slechte prestatie op
nieuwe data, omdat het de trainingset heel goed kent.
Grote waarden van k -> ruis heeft kleine invloed -> minder complex
model -> underfitting -> high error -> te veel generalisatie en mist
belangrijke patronen.
Parametric models:
- The number of parameters is fixed.
- Once the model is trained (parameters are determined), we can
throw away the training dataset.
- Linear regression is an example.
Non-parametric models:
- The number of parameters is not fixed, it grows with the number of
training samples.
- K-NN is an example of non-parametric machine learning model.
Lecture 2
11-9-2024
Linear models for regression and
classification
In general:
, Fundamentals
Model interpretation
Estimation
Model evaluation
All in lectures = exam look at the book for different interpretations
Linear models = combination of inputs (predictors, features or
independent variables) to predict the output
Regression = output is quantitative (continuous variable)
Classification = cateragories (binary variable (can be multiclassed))
When to use a linear model? look at the complexity of the cohesion
between the variables and output.
Least squares method
Y on y axis
X on x axis
You want to find the red line