100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Data Science: Machine Learning 2017/2018 - Summary Lectures

Beoordeling
4,5
(6)
Verkocht
49
Pagina's
32
Geüpload op
20-06-2018
Geschreven in
2017/2018

Full summary including an introduction of Machine Learning and algorithms, such as Decision Tree, Perceptron, Gradient Descent, Logistic Regression (classifier) and Neural Networks. This summary also includes a section about Feature Engineering. Extra context and illustrations/graphs are also given, which makes this field of study a bit more understandable.

Meer zien Lees minder










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
20 juni 2018
Aantal pagina's
32
Geschreven in
2017/2018
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Machine Learning
Lecture 1 – Introduction

You can have collection of rules that tells the program what to do. You can write these rules by hand, and apply
them and test them. Then you notice that they work, or not work and can change it. You automating it, but you
are doing it by hand. With Machine Learning you take automation a bit further, we want the machine itself to
learn. How would that go? You need to collect some information about the distribution of words or sequences.
 Learning from examples, based on supervised learning.
 Find examples of SPAM and non-SPAM
 Come up with a learning algorithm
 A learning algorithm infers rules from examples
 These rules can then be applied to new data (emails)

Types of learning problems
Machine Learning has an input space and an output space. The nature of the output determines which kind of
machine learning form/problem we are talking about.

Regression
Regression involves estimating or predicting a response. The response/the output variable takes continuous
values. Thus, a real number.
 Predict person’s age
 Predict price of a stock
 Predict student’s score on exam

Binary classification
The output variable takes class labels, but classifies the output into two groups: a yes/no answer, e.g.
True/false or 1/0.
 Detect SPAM
 Predict polarity of product review: positive or negative
 Predict gender: male or female

Multiclass classification
The output is one of a finite set of options. Involve mostly more than thousands of labels / classes / categories.
Each training point belongs to one of n different classes. The goal is to construct a function which, given a new
data point, will correctly predict the class to which the new point belongs to.
 Classify subject newspaper articles: politics, sports, science, technology, health, etc.
 Detect species based on photo: passer domesticus, calidris alba, etc.

Multilabel classification
Multilabel classification is a classification problem where multiple target labels can be assigned to each
observation instead of only one. A multilabel classifier has to product a vector of output values. The output is
based on yes/no answers. You can think of it as a binary classification.
 Assign songs to one or more genres:
o {rock, pop, metal}
o {hip-hop, rap}
o {jazz, blues}
o {rock, punk}

Ranking
Order object according to relevance. Ranking models for information retrieval systems. Training data consists
of lists of items with some partial order specified between items in each list.
 Rank web pages in response to user query
 Predict student’s preference for courses in a program

,Sequence labelling
Type of pattern recognition task that involves the algorithmic assignment of a categorical label to each member
of a sequence of observed values (e.g. speech tagging). Input is a sequence of elements (words) and the
response is a corresponding a sequence of labels.
 Labels words in a sentence with their syntactic category
 Labels frames in speech signal with corresponding phonemes (W, ð, Ɛ, ɚ)

o N inputs | N inputs | N not necessarily = M | Sequence 2 sequence
o N outputs | M outputs | |

Autonomous behaviour
The input are measurements from sensors – camera, microphone, radar, accelerometer, etc. and the response
are instructions for actuators – steering, accelerator, brake, etc.
Supervise learning is very often improved with reinforcement learning: learn from the sequence. It works with
positive and negative learning. Supervised learning is not the end of the story, but sometimes it is not really
applicable. Unsupervised learning became a very important approach also.

In what situation do you use F1 score instead of accuracy?
___________________
___________________________________

Evaluation
How well is the algorithm learning? You can evaluate the performance by using different evaluation metrics.

Mean Absolute Error
The average absolute difference between true value and predicted value




Mean Squared Error
The average square of the difference between true value and predicted value.




The aforementioned metrics can be used for predicting age (regression, numerical output) with a preference to
MSE. The MSE exaggerates the outliers (/magnitude of big numbers), and the MAE does not.

Accuracy
Accuracy is calculated as the number of all correct predictions divided by the total number of the dataset. The
best accuracy is 1.0, whereas the worst is 0.0. It can also be calculated by 1- error rate.

(TP + TN) / (P + N)

Error rate
It is a proportion of mistakes The error rate is calculated as the number of all incorrect predictions divided by
the total number of the dataset. The best error rate is 0.0, whereas the worst 1.0.

(FP + FN) / (P + N)

Predicting gender could use accuracy or the error rate as evaluation metric. However, for flagging spam
purposes error rate is preferred.  If accuracy is 99 percent, you would probably display the error rate instead.
Is there any disadvantage? The error rate does not take into account if a false negative is worse than a false
positive.

, Precision and recall
This metric is a useful measure of success of prediction when the classes are very imbalanced. In information
retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results
are returned. Metrics which focus on one kind of mistakes. Is done sizes of certain sets.
 Precision
The ratio of correctly predicted positive observations to the total predicted positive observations (of
all passengers that labeled as survived, how many actual survived? /what fraction of flagged emails
were real SPAMS?)




 Recall
The ratio of correctly predicted positive observations to the all observations in actual class – yes (of all
the passengers that truly survived, how many did we label? / what fraction of real SPAMS were
flagged as SPAM?)




True Positives (TP) = the correctly predicted positive values
True Negatives (TN) = the correctly predicted negative values
False Positives (FP) = when actual class is no and predicted class is yes
False Negatives (FN) = when actual class is yes but predicted class is no

F-score
The harmonic mean between precision and recall. It is a kind of average aka F-measure. This score takes both
false positives and false negatives into account.




Fbeta
Parameter B quantifies how much more we care about recall than precision. It gives different importance
between precision and recall. F0.5 would mean that we care half as much about recall as about precision. The
beta parameter determines the weight of precision in the combined score. Beta < 1 lends more weight to
precision, while beta > 1 favors recall.




What is the difference between precision/recall, F-score and Fbeta?
F1 is usually more useful than accuracy, especially if you have an uneven class distribution. Accuracy works best
if false positives and false negatives have similar cost.

Macro-average (multi-class classification)
It computes the Fscore per-class, and average. It calculate metrics for each class independently, and find their
unweighted mean. This does not take label imbalance into account. The rare classes have the same impact as
frequent classes. This can be a good thing or a bad thing, depends on what you want.

Micro-average (multi-class classification)
This calculates metrics globally by counting the total number of times each class was correctly predicted and
incorrectly predicted. You do it by a case by case basis.
 Treat each correct prediction as TP
 Treat each missing classification as FN
 Treat each incorrect prediction as FP
€3,99
Krijg toegang tot het volledige document:
Gekocht door 49 studenten

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Beoordelingen van geverifieerde kopers

Alle 6 reviews worden weergegeven
5 jaar geleden

This summary made the difference, absolutely (B4 2019-2020).

5 jaar geleden

6 jaar geleden

6 jaar geleden

6 jaar geleden

7 jaar geleden

4,5

6 beoordelingen

5
3
4
3
3
0
2
0
1
0
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
ambervdmeijs Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
125
Lid sinds
7 jaar
Aantal volgers
95
Documenten
5
Laatst verkocht
1 jaar geleden

4,6

16 beoordelingen

5
9
4
7
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen