100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Machine Learning: Comprehensive Summary

Rating
4,2
(5)
Sold
29
Pages
24
Uploaded on
05-12-2019
Written in
2019/2020

Very useful summary for the Machine Learning course at Tilburg University (Data Science and Society). Summary contains all parts that are discussed in the course (2019) extra examples, explanation and notes. Summary also contains many useful visualizations.

Show more Read less
Institution
Course












Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
December 5, 2019
File latest updated on
December 14, 2019
Number of pages
24
Written in
2019/2020
Type
Summary

Subjects

Content preview

Machine Learning Tilburg University 2019-2020



MACHINE LEARNING
LECTURE 1: EVALUATION

Machine Learning: Machine learning studies algorithms which can learn to solve problems from
examples. It enables automate problem solving, like flagging spam in your e-mail or recognizing faces
in photos.

Rules: You need rules to solve these problems. Rules are learned from our examples, but we must
provide the format for these rules (decision tree or linear model for example).
• Example: If (A or B or C) and not D, then SPAM

Learning from examples (Machine Learning Approach):
• Find examples of SPAM and non-SPAM
• Come up with a learning algorithm
• A learning algorithm infers rules from examples
• These rules can then be applied to new data (e-mails in this specific case)

Learning Algorithms: Machine learning uses programmed algorithms that receive and analyze input
data to predict output values. As new data is fed to these algorithms, they learn and optimize their
operations to improve performance, developing 'intelligence' over time.

Types of Machine Learning problems:
• Regression: response will be a (real) number
o Predict person’s age, price of stock, score on an exam
• Binary classification: response will be Yes/No answer (classifying instances into one of two
classes)
o SPAM detection (yes / no), polarity of product review (positive / negative), predict
gender (male / female)
o Each sample is assigned to only ONE label
• Multiclass classification: response will be one of a finite set of options (classification task with
more than two classes, but each sample is assigned to only ONE label)
o Classify articles as politics, science, health, finance, sports etc.
• Multilabel classification: a finite set of Yes/No answers. There is no constraint on how many
of the classes the instance can be assigned.
o Assign songs to one or more genres. Each song may belong to several genres
simultaneously.
▪ Examples: {rock - pop - metal} OR {jazz - blues} OR {rock - punk}
• Ranking: Order objects according to relevance (ranking models for information retrieval
systems)
o Rank web pages in response to user query (like using Google to find information)
• Sequence Labeling: the algorithmic assignment of a categorical label to each member of a
sequence
o Input: sequence of elements (e.g. words)
o Response: a corresponding sequence of labels

1

,Machine Learning Tilburg University 2019-2020


▪Label words in a sentence with their syntactic category (determiner, verb,
noun etc.)
▪ Label frames in speech signal with corresponding phonemes
• Sequence-to-sequence modeling aims to map a fixed length input with a fixed length
o Input: a sequence of elements
o Response: another sequence of elements
▪ Possibly different length
▪ Possibly element from different sets
o Examples: translate between languages or summarizing texts

How to evaluate the success of our machine learning solutions: It depends on the type of problem
• Regression: Mean Absolute Error (average (absolute) difference between true value and
predicted value) or the Mean Squared Error (average square of the difference between true
value and predicted value)
• Classification: Error rate (the proportion of mistakes)
o Disadvantage: kind of mistake not considered and risk of disbalanced classes

Kinds of mistakes:
• False positive – Flagged as SPAM, but non-SPAM
• False negative – Not flagged, but is SPAM
• False positives are usually bigger problems!!!

Precision and Recall: Metrics which focus on one kind of mistake
• Precision – What fraction of flagged emails were real SPAMs?
• Recall – What fraction of real SPAMs were flagged?




F-score: Harmonic mean between precision and recall. It considers both the precision
(p) and the recall (r) of the test to compute the score. F-score reaches its best value at
1 (perfect precision and recall) and worst at 0.

2

,Machine Learning Tilburg University 2019-2020


Two other commonly used F measures are the F2 measure, which weighs recall higher than precision
(by placing more emphasis on false negatives), and the F0.5 measure, which weighs recall lower than
precision (by attenuating the influence of false negatives). → Fβ-score: Parameter β quantifies how
much more we care about recall than precision.




Macro-average: Compute precision and recall per-class, and average:
• Rare classes have the same impact as frequent ones. So, you must look at the scores per
class as well!
o Macro-average precision with two different sets = (P1+P2)/2
o Macro-average recall with two different sets = (R1+R2)/2

Micro-average: Only useful if we treat one class as a default class
• Treat each correct prediction as TP
• Treat each missing classification as FN
• Treat each incorrect prediction as FP

Micro-average of precision with two different sets = (TP1+TP2)/(TP1+TP2+FP1+FP2)
Micro-average of recall with two different sets = (TP1+TP2)/(TP1+TP2+FN1+FN2)

Using examples: Always use disjoint sets of examples (training set, development set, and test set)
• Always use the same evaluation metrics for development set and test set
• Important for evaluation to be close to true (real world) objective


LECTURE 2: DECISION TREES AND FORESTS

Learning Rules: nested if-then-else rules




Classification: The problem of identifying to which of a set of categories a new observation belongs.
Example: when classifying fruit, looking at the shape and color to figure out what type of fruit it is.

Decision tree: Model that predicts the value of a target variable based on several input variables.
Decision trees classify instances by sorting them down the tree from the root to some leaf node,
which provides the classification of the instance. Example:




3

,Machine Learning Tilburg University 2019-2020


• The very top of the three is
called the Root Node
• Internal Nodes have arrows
pointing to them and away
from them
• Leaf Nodes have arrows
pointing to them, but NOT
away from them (The last
node in a path)

Using a decision tree:
If NO  Prediction apply algo with
left branch

If YES  Prediction apply algo with
right branch

Building a decision tree: A decision tree is drawn upside down with its root at the top. The number
of possible trees grows exponentially with number of features.
• You can’t check them all and see which one works best
• So, we need to build a tree incrementally. Thus, selecting one particular node and consider it
as fixed if it is useful, and then continue with another node.

Which question to ask first?
• It is best to ask important questions first
• This is the one which helps us classify the best → If we had to classify data based on only
ONE question, which question would do best?

Step by step decision tree building:
1. Choose one question you could ask
2. Check how many correct answers you get by classifying based on this particular question only
3. Do this for each possible question
4. The first question in the model (decision tree) is the question that produced the most correct
answers
5. Go to next level of the tree and repeat this procedure (excl. the question you already asked)

Building a decision tree:
• If all examples have same label → Create leaf node with label

Otherwise:
• Choose most important question
• Split data into two parts (NO and YES) according to question
• Remove question from question set
• Left branch Apply algo to ← NO examples
• Right branch Apply algo to ← YES examples
• Create node with (question, left branch, right branch)


4

, Machine Learning Tilburg University 2019-2020


Digression: Recursion
• We build and use decision trees with recursive functions
• Recursive function: Function that calls itself during its execution. This enables the function to
repeat itself several times, outputting the result and the end of each iteration
• A recursive function is a function that calls itself until a “base condition” is true, and
execution stops

Trees: Recursively defined data structures
• Base case = Leaf node (Returns a value without making any following recursive calls)
• Recursive case = If the base case is false, enter the recursive condition (Case when the
function calls itself)  Branch node




Decision tree efficiency: Speed depends on number of questions needed to get to a leaf node. This
depends on the depth of the tree. The depth of a decision tree is the length of the longest path from
a root to a leaf ( count the number of branches!)
• Unbalanced trees: The left and right subtree have different number of nodes and depth
• Balanced trees: The left and right subtree have equal number of nodes and depth




Depth of balanced tree: In a balanced binary tree, each time you ask a question, you halve the
number of remaining questions → Example (N = 8, depth = 3) → (= Log (8) / Log (2) == 3)




5

Available practice questions

R91,23
Get access to the full document:
Purchased by 29 students

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Reviews from verified buyers

Showing all 5 reviews
2 year ago

2 year ago

3 year ago

5 year ago

5 year ago

4,2

5 reviews

5
1
4
4
3
0
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
dc070498 Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
202
Member since
6 year
Number of followers
137
Documents
27
Last sold
1 month ago

3,8

24 reviews

5
2
4
17
3
3
2
2
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions