Summary

Machine Learning: Comprehensive Summary

Name: Machine Learning: Comprehensive Summary
SKU: doc_619446
Rating: 4.20 (5 reviews)
Author: dc070498

Rating

4,2

(5)

Sold

Pages

Uploaded on

05-12-2019

Written in

2019/2020

Very useful summary for the Machine Learning course at Tilburg University (Data Science and Society). Summary contains all parts that are discussed in the course (2019) extra examples, explanation and notes. Summary also contains many useful visualizations.

Show more Read less

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Tilburg University (UVT)
Study: Data Science & Society
Course: Machine Learning (880083M6)

All documents for this subject (16)

Document information

Uploaded on: December 5, 2019
File latest updated on: December 14, 2019
Number of pages: 24
Written in: 2019/2020
Type: Summary

Subjects

machine learning
logistic regression
decision tree
neural networks
artificial intelligence
perceptron
multilayer perceptron
gradient descent
data science
random forests
stochastic gradient descent

Content preview

Machine Learning Tilburg University 2019-2020

MACHINE LEARNING
LECTURE 1: EVALUATION

Machine Learning: Machine learning studies algorithms which can learn to solve problems from
examples. It enables automate problem solving, like flagging spam in your e-mail or recognizing faces
in photos.

Rules: You need rules to solve these problems. Rules are learned from our examples, but we must
provide the format for these rules (decision tree or linear model for example).
• Example: If (A or B or C) and not D, then SPAM

Learning from examples (Machine Learning Approach):
• Find examples of SPAM and non-SPAM
• Come up with a learning algorithm
• A learning algorithm infers rules from examples
• These rules can then be applied to new data (e-mails in this specific case)

Learning Algorithms: Machine learning uses programmed algorithms that receive and analyze input
data to predict output values. As new data is fed to these algorithms, they learn and optimize their
operations to improve performance, developing 'intelligence' over time.

Types of Machine Learning problems:
• Regression: response will be a (real) number
o Predict person’s age, price of stock, score on an exam
• Binary classification: response will be Yes/No answer (classifying instances into one of two
classes)
o SPAM detection (yes / no), polarity of product review (positive / negative), predict
gender (male / female)
o Each sample is assigned to only ONE label
• Multiclass classification: response will be one of a finite set of options (classification task with
more than two classes, but each sample is assigned to only ONE label)
o Classify articles as politics, science, health, finance, sports etc.
• Multilabel classification: a finite set of Yes/No answers. There is no constraint on how many
of the classes the instance can be assigned.
o Assign songs to one or more genres. Each song may belong to several genres
simultaneously.
▪ Examples: {rock - pop - metal} OR {jazz - blues} OR {rock - punk}
• Ranking: Order objects according to relevance (ranking models for information retrieval
systems)
o Rank web pages in response to user query (like using Google to find information)
• Sequence Labeling: the algorithmic assignment of a categorical label to each member of a
sequence
o Input: sequence of elements (e.g. words)
o Response: a corresponding sequence of labels

1

,Machine Learning Tilburg University 2019-2020

▪Label words in a sentence with their syntactic category (determiner, verb,
noun etc.)
▪ Label frames in speech signal with corresponding phonemes
• Sequence-to-sequence modeling aims to map a fixed length input with a fixed length
o Input: a sequence of elements
o Response: another sequence of elements
▪ Possibly different length
▪ Possibly element from different sets
o Examples: translate between languages or summarizing texts

How to evaluate the success of our machine learning solutions: It depends on the type of problem
• Regression: Mean Absolute Error (average (absolute) difference between true value and
predicted value) or the Mean Squared Error (average square of the difference between true
value and predicted value)
• Classification: Error rate (the proportion of mistakes)
o Disadvantage: kind of mistake not considered and risk of disbalanced classes

Kinds of mistakes:
• False positive – Flagged as SPAM, but non-SPAM
• False negative – Not flagged, but is SPAM
• False positives are usually bigger problems!!!

Precision and Recall: Metrics which focus on one kind of mistake
• Precision – What fraction of flagged emails were real SPAMs?
• Recall – What fraction of real SPAMs were flagged?

F-score: Harmonic mean between precision and recall. It considers both the precision
(p) and the recall (r) of the test to compute the score. F-score reaches its best value at
1 (perfect precision and recall) and worst at 0.

2

,Machine Learning Tilburg University 2019-2020

Two other commonly used F measures are the F2 measure, which weighs recall higher than precision
(by placing more emphasis on false negatives), and the F0.5 measure, which weighs recall lower than
precision (by attenuating the influence of false negatives). → Fβ-score: Parameter β quantifies how
much more we care about recall than precision.

Macro-average: Compute precision and recall per-class, and average:
• Rare classes have the same impact as frequent ones. So, you must look at the scores per
class as well!
o Macro-average precision with two different sets = (P1+P2)/2
o Macro-average recall with two different sets = (R1+R2)/2

Micro-average: Only useful if we treat one class as a default class
• Treat each correct prediction as TP
• Treat each missing classification as FN
• Treat each incorrect prediction as FP

Micro-average of precision with two different sets = (TP1+TP2)/(TP1+TP2+FP1+FP2)
Micro-average of recall with two different sets = (TP1+TP2)/(TP1+TP2+FN1+FN2)

Using examples: Always use disjoint sets of examples (training set, development set, and test set)
• Always use the same evaluation metrics for development set and test set
• Important for evaluation to be close to true (real world) objective

LECTURE 2: DECISION TREES AND FORESTS

Learning Rules: nested if-then-else rules

Classification: The problem of identifying to which of a set of categories a new observation belongs.
Example: when classifying fruit, looking at the shape and color to figure out what type of fruit it is.

Decision tree: Model that predicts the value of a target variable based on several input variables.
Decision trees classify instances by sorting them down the tree from the root to some leaf node,
which provides the classification of the instance. Example:

3

,Machine Learning Tilburg University 2019-2020

• The very top of the three is
called the Root Node
• Internal Nodes have arrows
pointing to them and away
from them
• Leaf Nodes have arrows
pointing to them, but NOT
away from them (The last
node in a path)

Using a decision tree:
If NO  Prediction apply algo with
left branch

If YES  Prediction apply algo with
right branch

Building a decision tree: A decision tree is drawn upside down with its root at the top. The number
of possible trees grows exponentially with number of features.
• You can’t check them all and see which one works best
• So, we need to build a tree incrementally. Thus, selecting one particular node and consider it
as fixed if it is useful, and then continue with another node.

Which question to ask first?
• It is best to ask important questions first
• This is the one which helps us classify the best → If we had to classify data based on only
ONE question, which question would do best?

Step by step decision tree building:
1. Choose one question you could ask
2. Check how many correct answers you get by classifying based on this particular question only
3. Do this for each possible question
4. The first question in the model (decision tree) is the question that produced the most correct
answers
5. Go to next level of the tree and repeat this procedure (excl. the question you already asked)

Building a decision tree:
• If all examples have same label → Create leaf node with label

Otherwise:
• Choose most important question
• Split data into two parts (NO and YES) according to question
• Remove question from question set
• Left branch Apply algo to ← NO examples
• Right branch Apply algo to ← YES examples
• Create node with (question, left branch, right branch)

4

, Machine Learning Tilburg University 2019-2020

Digression: Recursion
• We build and use decision trees with recursive functions
• Recursive function: Function that calls itself during its execution. This enables the function to
repeat itself several times, outputting the result and the end of each iteration
• A recursive function is a function that calls itself until a “base condition” is true, and
execution stops

Trees: Recursively defined data structures
• Base case = Leaf node (Returns a value without making any following recursive calls)
• Recursive case = If the base case is false, enter the recursive condition (Case when the
function calls itself)  Branch node

Decision tree efficiency: Speed depends on number of questions needed to get to a leaf node. This
depends on the depth of the tree. The depth of a decision tree is the length of the longest path from
a root to a leaf ( count the number of branches!)
• Unbalanced trees: The left and right subtree have different number of nodes and depth
• Balanced trees: The left and right subtree have equal number of nodes and depth

Depth of balanced tree: In a balanced binary tree, each time you ask a question, you halve the
number of remaining questions → Example (N = 8, depth = 3) → (= Log (8) / Log (2) == 3)

5

Available practice questions

R91,23

Get access to the full document:

Purchased by 29 students

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

dc070498

3,8

(24)

Reviews from verified buyers

Showing all 5 reviews

mdegal Data Science & Society · 15 reviews

2 year ago

Mathias Business Administration: Business Administration / International Business Administration

2 year ago

Chris2021 Data Science & Society · 10 reviews

3 year ago

JoseOrozco77 Data Science · 2 reviews

5 year ago

martintielemans1 Information Management · 17 reviews

5 year ago

4,2

5 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

dc070498 Tilburg University

View profile

Sold

202

Member since

6 year

Number of followers

137

Documents

Last sold

1 month ago

3,8

24 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller dc070498. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for R91,23. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 48957 documents were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now

Machine Learning: Comprehensive Summary

Written for

Document information

Subjects

Content preview

More courses for Tilburg University (UVT) > Data Science & Society

Available practice questions

Reviews from verified buyers

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay how you prefer, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying this summary from?

Will I be stuck with a subscription?

Can Stuvia be trusted?