Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary - Machine Learning (880083-M-6)

Beoordeling
-
Verkocht
-
Pagina's
88
Geüpload op
02-02-2026
Geschreven in
2024/2025

This is a very detailed and extensive 88-page summary of the whole Machine Learning course. I watched every lecture and included everything from the slides. I also included my own notes and basically wrote down everything the lecturer said. Course given by Dr. Grzegorz Chrupała and Dr. Mojtaba Rostami Kandroodi.

Meer zien Lees minder

Voorbeeld van de inhoud

Machine Learning

Lecture 1: Machine Learning
How can we automate problem solving? For example, flag spam in your inbox. You can use
a data-driven approach, which means that it is based on some data rather than on explicit
specification that we write. You can collect a data set and give it some structure, some
additional information. The model or a learning algorithm will then extract knowledge from
this data. You can call it learning from examples. This data set consists of examples. The
algorithm will learn from these examples how to solve the task of interest. Take for example
the following data set:




You can tell that the left box is the one containing spam. We want our learning algorithm to
be designed in a way that it can look at these examples. You can tell it what a possible clue
could be, like a word or a phrase, or a punctuation symbol. Then it will figure out which of
these words or phrases etc. correlate with the classes (left or right). You’re trying to find
associations between these clues and the classes. Ideally, the model would be able to learn
that.




How does it do that? You can think in a general sense that it learns some kind of rules. The
rules can be formulated in very different ways, depending on the specifics of the learning
algorithm that we’re talking about. See the example above. It could be an if-then-else type of
rules, like logic rules. In this case it is kind of a complicated rule which checks certain
conditions, using boolean operators such as ‘on’, ‘or’, ‘and’, and ‘not’, applying it to those
conditions and then just makes a determination about the class of the specific email. This is
not really a typical rule that is used with machine learning algorithms, but it is someone that
could be the case.

Another alternative (see the lower formula). Here you have some features extracted from the
text (could be words or some other features), indicated by these symbols x1, x2, x3. We
assign weights to them, some numerical weight, and the weighted sum. If the weighted sum
of these features is more than or equal to zero, then maybe we will define the email as
spam.

,Learning from examples
- Data: examples of SPAM and NONSPAM
- Learning algorithms learns rules from data
- Rules applied to new data, and evaluated against known labels
- Finally, the system can be deployed

A typical machine learning project or system would first depend on data. Then there would
be some learning algorithms, so some determined precise procedure which goes through
this data and extracts these rules in some way. Once this is complete, these rules can be
applied to new data. We can then maybe give some of the data to the model to learn from
and keep some of the data to then evaluate the model. The model would output some
responses. And since we already know the answer for this portion of the data, we can
evaluate how well the model is doing. Once this is repeated a few times, you can change
some things, like collecting more data or changing the algorithm, or do something else that
makes the performance better. After it is done,these three parts, we can deploy this system,
meaning that we can use it in the real world.

There are multiple algorithms, such as decision trees. Decision trees extract rules in a format
which is quite similar to this logic type of rules, the if-then-else type of rules. You also have
logistic regression, which uses a format of rules which is more similar to the second example
where you just have a weighted sum of the features, and based on that you make a
decision.

Learning algorithms
- Tree-based: decision trees, random forests, gradient boosted trees.
- Linear classifiers and regressors: perceptron, linear and logistic regression.
- Neural networks: multi-layer perceptrons, deep learning.
- Richard Feyman: “What I cannot create, I do not understand”
- A prerequisite for understanding something at the very deep level and in
precise detail is being able to create, being able to build it.

Types of ML problems
What is the structure of the problem? What is the object that you are trying to predict? What
is the nature of the variable that we are predicting, the outputting? So a machine learning
algorithm, these rules that it extracts from data, they are trying to map the inputs to some
outputs. In the case of spam, it will have a subject and text to the email, and the output
would be the label, whether it is spam or not. Depending on what the output of the problem
is, what the target of the prediction is, we often will talk about the specific type of ML
problem.

Real number. Regression.
Probably the most basic type of prediction would be a real number, like predicting’s
someone’s age or price of something, these as a scalar. A single number which can be
pretty arbitrary. We will call this type a problem a regression problem. We then have models
such as linear regression, a specific type of model which has a very simple structure and
predicts a number.

,Yes/No. Binary classification.
Another very simple machine learning problem, which we have already seen in the case of
email, spam detection, is called binary classification. It involves an output which is a binary
variable: yes/no. There are only two options. For example for sentiment prediction for a
piece of text: we can say it is either positive or negative. Sometimes you also have ‘neutral’
as a value, but then it is no longer a binary prediction problem.

In a more complex case: let's say that you have an app that gives pictures of birds, trying to
determine which species it belongs to. This cannot be binary classification because there are
more than two species of birds.

One of a set of options. Multi-class classification.
Some examples include detecting these bird species, but also classifying newspaper articles
based on the topic, concerning topics such as politics, sports, science, finance, etc. There is
only one of a set of options. We only choose one of them. If we are classifying bird species,
every individual will belong to one species and not to multiple. This is the key property of
multi-class classification that it involves one of a set of options. In multi-class classification
we have multiple classes overall but each individual example belongs to only one of them.




Multi labels from a set of options. Multilabel classification.
Multi-class is different from multilabel. In multilabel classification we can have multiple labels
for the same example. This is a bit more like multivariate regression because you are doing
regression but for two different variables at the same time for longitude and latitude. For
multilabel classification we have a set of options, so a discrete variable, a set of discrete
variables. Note that multilabel classification is equivalent to a set of yes/no answers.
Because you can ask for each of these labels that you have here (see picture above), you
could enumerate all of them, and for each picture you can ask if you should apply this or the
next one. So multilabel classification is like multiple binary classification, classifications
happening at the same time (good thing to note!).

Another example multi label classification may be something like classifying pieces of music,
songs, into genres. Often these genres are a bit too rigid so you may want to apply two of
them to the same song (at the same time it could be jazz but also blues, it combines the
elements from both).

,An ordering of objects. Ranking.
Here we have a query. The search engine returns a number of documents ordered by how
relevant they are to this query. Here, the target of prediction is not just a label or a set of
labels and also not a number, but it is an ordered sequence. It is an ordering or ranking on
top of the set of candidate objects, in this case documents/web pages. That is known as
ranking. → Not that important for the exam since it is a bit more advanced.




A label for each element in input sequence. Sequence labelling.
The picture below illustrates a task which is known as a transcription or automatic speech
recognition or speech to text. Here there are different representations of the human voice,
this is a wave from some kind of spectrogram. In the middle is the desired output, and in this
case we have some symbols and some transcription in some alphabet (‘it rains a lot in
portland”). We can assign to each frame a label, for example which phoneme it belongs to.
We can make it so that for each of these points we would have a specific phoneme. This is
known as sequence labelling. We have a label for each element in an input sequence. It is
similar to multilabel classification, but instead of having a single element, we have a
sequence of elements as a sequence of objects, and for each of them we give a label. The
issue here is that the labels depend on each other. If you are transcribing spoken utterance,
like speech recording, then how likely is it that a label at a particular point in time is
depending on what this person said before. There is this sequential structure in the label and
in the input, and for this reason, we usually don't consider doing it in isolation, we only
consider the whole sequence as the input and the whole sequence of outputs as the target
of prediction. For sequence labelling, we usually try to treat it as a 1 to 1 mapping. For each
element in the input sequence, we try to give one label in the output sequence, so that it is a
1 to 1 mapping between objects and labels. Other examples are labelling words in a
sentence with their grammatical category that could be useful for some type of linguistic
analysis. However, this is a setting which is relatively constrained by this 1 to 1 mapping.

,Sequence to sequence prediction.
Often there would be a more general type of problem, like illustrated below, where you have
a sequence on one side and a sequence on the other side. What kind of task are we talking
about here? → Translating, which involves a sequence on the input side and the output side,
but there is no 1 to 1 mapping between the inputs and outputs. They are there to express the
same meaning once you put them together but there is no simple mapping between them. In
the example, ‘Gallia’ is the first word in the input but it refers to ‘France’ in the third word of
the output. There is this kind of complicated mapping, reordering of the elements of the
sequence and even more complicated relations. In the case of translating texts between
different languages like the example here, we have a lot of reordering and we have too
many relations. This is known as sequence to sequence prediction, or a sequence to
sequence modelling, which is a common problem. Here, the input is a sequence of
elements, the target is another sequence of elements.




Side info: supervised and unsupervised learning are two terms that correspond to whether
we have labels/examples of the desired targets in our data or not. For example, for text
summarisation, if we have the full text but also the shorter text together, it is supervised
learning. But if we only have the longer text and then we have some learning mechanism
which can shorten them without having examples of the short-term versions, that would be
unsupervised learning. There is no supervision in the sense of output, we only have
examples of input. → These were the main types of problems at the abstract level so far.

Evaluation
Evaluation is an important aspect of a machine learning project. A large proportion of the
time you will spend evaluating the performance of the system and based on how it is
performing, you will have to tune it again, change things, and redesign it. → Iterative
process.

Some examples which we know the output for, the labels for, we use them for training the
system. The learning algorithm is extracting some information from them. And then we have
something which corresponds to a mock exam. This is known as the evaluation set, these
sets of examples are not used for training the model but are used for an interim evaluation of
the model in order to tune it further. This is known as the validation set. At the end we have
some set of examples that we don't use for anything else and only use for the final
evaluation, which is known as the test set.

,Splitting data for evaluation.
● Training set.
○ Learn, infer rules
● Validation (or development) set
○ Monitor performance, choose best learning options
● Test set
○ Evaluate generalisation to real-world setting
○ Not accessible in advance

These three sets of data have very specific roles and they shouldn’t be mixed. That is
important since it is like the best practice type of thing that has been developed over the
years in the machine learning community. Something similar you often see in modern day
statistical approaches to data analysis, you have some training sets which you use for the
machine to learn, you have the validation, sometimes called the development set, which you
use for choosing best learning options, monitoring performance, whether it improves or not
as you change things. Then you have a completely separate test set that you don’t use for
anything else, which allows you then to simulate real world performance. The model is
faced with completely unknown data and has to perform in this scenario.

How do we decide which examples from our whole data set are going into training,
validation, and test data? What is the procedure to split the data?
- K-fold cross-validation is a specific way of splitting the data into training and
validation in multiple ways. Not the most common way.
- Completely random split: we decide how large we want to have our training set,
validation set, and test set,a dn then randomly assign data points to one of those
based on the size that we want. Typically we want something like 80% of the data for
training, 10% for validation and 10% for testing. But of course there are some
considerations which we want to take into account and often we use some
alternatives.

Alternatives? In what circumstances?
One of the alternatives is a cross-validation. In what circumstances would we choose
cross-validation instead of the regular validation set, which is just a single validation set?
- Sample size? Because if you have a small data set and split it, we split 10% off of it,
we could end up with a pretty unrepresentative sample. Maybe you want to repeat
this instead of just taking one validation set to check how things are working. We can
do it like a few times, 5-10, take different consecutive samples from teh data and
repeat the training and the validation. Then average all these numbers and then we
have a more reliable number to guide us in our training and tuning procedure.

And so, that is cross-validation. If you do it 10 times, it is called k-10-fold cross validation.
You are using some of the data for training and some for validation repeatedly. You’re just
changing which portion of the data is being used for training and for testing.

,Another case that also is sometimes considered, is stratification. This picture is trying to
represent stratification from a population. What is the point of stratification? → To get a
representative sample. Specifically the representative of the classes of the strata in our data.
It is not representative in terms of other attributes, but at least we want to make sure all the
classes are represented in the degree that they are present in the data. IN this example, we
take one third of every class: the white ones, grey ones and black ones, and we end up with
a representative sample. When is this important in terms of machine learning? → Maybe you
have a class distribution which is skewed to the right and you want to make sure that in your
validation data, there are all of these classes. If you completely randomly sample without
stratification, you could end up with the validation only having the most common classes, but
not the less common classes. Important in general but even more if your examples involve
people from different categories that you want to capture as part of the population, but also
in general without regards to social bias, it is just a bias in the data it can cause you to
underestimate or overestimate the performance of some classes etc. So it is important if you
have a class skew/unbalanced classes in your data.

Another case where we have to be careful about how we
split the data is when we have time series, or if we have
data which is distributed along the temporal dimension in
some way. If you have prices of a product which change
over time, or anything that changes over time that has
some kind of temporal evolution to it, we don’t want to
completely randomly split it into training and evaluation.
We rather make sure that we only use the past to predict
the future, and not vice versa. When we have a time
series we will make sure that our training data comes from
a period of time before our validation or test data comes
from.

,We can also have a version of it where we are doing cross-validation and then gradually
increase the training, the folds of training, but always keep evaluating on something which
comes after (see picture below). These are the considerations which are important to keep in
mind when we are splitting the data into these different categories.

Evaluation metrics. What is important to consider?
In terms of evaluation, we also want to consider what metrics we are using for evaluation.
What are important things to consider when you are deciding on which metric to use? → We
should use the same evaluation metric on the development and test data on the validation
and test data. Like using something to validate our model to fine tune our learning algorithm,
we should use the same also for the test. Otherwise one type of metric for one thing, and
then another metric is not ideal. Another important thing to consider is that our evaluation
metric should reflect the true objective that the learning project is about. The system has to
perform a certain task, and the way we measure its performance should really match all the
different considerations that we have in the real world. The consequences of the model if the
model makes any mistakes, how does it affect you or the users of your system? Ideally, your
evaluation metric should capture all of these things as closely as possible.




In terms of evaluation metrics for regression, we have these formulas. The mean absolute
error is the most straightforward way to measure regression, measuring the performance of
the regression problem. It takes the absolute differences between the predicted number and
the real number, that is the true answer, and just averages that over the whole dataset. The
mean squared error is similar but instead of taking the absolute value of the difference, we
take the square of the difference. In both cases, you’re getting rid of the negative numbers.
There is a difference between these two in regards how close it is to the original scale of the
values/ If you are squaring the differences then you can distort the original scale of values.
You just look at the squares of the differences instead of the actual distance from the true
answer. Mean squared error is not that ideal from that point of view and absolute error does
not have that problem. If you just want to evaluate your system, then absolute or mean
absolute error is easier to understand because it does not change the scale.

,Another method that is often used is R squared, it is closely related to mean squared error
but it is normalised by the overall variance of the targets. And it is also subtracted from one,
so the higher the R squared, the better, rather than lower better as for mean squared error.




For classification we have metrics such as error rate and accuracy, which is defined as one
minus error rate. These are useful and straightforward but not always ideal. When you have
class imbalance, that is one reason, often you have a very unbalanced class, we can get
very high accuracy without doing anything with them, without every prediction of the minority
class. Another more serious issue we want to avoid is that it can arise with using accuracy
and error rate.




There is type 1 and type 2 error, terms for statistics which correspond to different types of
errors that you can be making. They are often called false positives vs. false negatives. If
you think about the spam classification example, you have some true positives. There are
other cases where you are correct. YOu have some cases which are spam, but not all of
them are true positives, but some of them which are flagged as spam but not all of them are
true positives either.




This is an example of how we can display this information in a tabular form. For example, if
we had okay here and that was spam, then it would be one type of error. You can display it in
this form, the so-called confusion matrix.

, In cases where we care about these different types of mistakes to different degrees, like in
the spam classification, if you have false negatives, that is nog such a big deal. It is just
spam which got true and we just deleted it. However, if you have a false positive, it is an
email that you miss, it will be deleted automatically or moved to the spam folder, and you will
most likely not see it at all, so it is a much more serious mistake. These metrics, which focus
on different types of mistakes. There is precision and recall. Precision is the number of true
positives among all the flagged emails. Recall is the proportion of true negatives among all
the spams. If we want good precision, it is important that we avoid false positives. If we want
a good recall, it is important that we avoid false negatives. You might think that if we care
only about false positives, we should be using just precision, but that is not ideal because
precision and recall on their own are often quite easy to gain.




Hence, we will typically combine precision and recall into a single score and have a formula
which trades off these two metrics by using this beta parameter which tells us how much
more we care about recall is about precision. So if beta is used is equal to 0.5 to half, then it
means that we care half as much about recall as about precision. If it is 2, then it means that
we care twice as much about recall as we care about precision. So, if we set beta to 1, then
it means that we care equally about precision and recall and we have an F1 score. But in
general it is just the F score and it has this beta parameter in it which we can use to trade
things off.

If we are doing unsupervised learning, it just means that the model itself is not learning from
labels. But when we are doing evaluation of the model we still need some labelled data.
Otherwise we don't know what the model is doing at all. It is very important whether the
model is doing supervised learning from labels or unsupervised learning without any labels.
For us as developers, we do need some amount of data which is labelled so we can
evaluate the model.

Documentinformatie

Geüpload op
2 februari 2026
Aantal pagina's
88
Geschreven in
2024/2025
Type
SAMENVATTING

Onderwerpen

€9,46
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
lemonsandlimes Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
13
Lid sinds
4 jaar
Aantal volgers
0
Documenten
3
Laatst verkocht
2 maanden geleden

4,5

2 beoordelingen

5
1
4
1
3
0
2
0
1
0

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen