100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Machine Learning

Rating
3.3
(4)
Sold
51
Pages
61
Uploaded on
01-02-2023
Written in
2022/2023

English Summary of Machine Learning course of Master Data Science and Society at Tilburg University. A summary of lecture materials, readings, and notes.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
February 1, 2023
Number of pages
61
Written in
2022/2023
Type
Summary

Subjects

Content preview

Machine learning
Lecture 1

Machine learning is about automation of problem solving. It is the study of computer
algorithms that improve automatically through experience. Involves becoming better at a
task T based on some experience E with respect to some performance measure P.
Examples:
- Span detection
- Movie recommendation
- Speech recognition
- Credit risk analysis
- Autonomous driving
- Medical diagnosis.
It comes up with a learned algorithm. It is about learning from experience.

What does it involve?
- ML may involve a notion of generalization. When the machine learns relationships
between the input and the output, we want this to work on unseen data, which is
the concept of generalization. Is it safe to assume that current observations are
generalized to future observations?
- Annotated data, objective, optimization algorithm, features/representations,
assumptions are some critical components.
- We assume the database presents the population. As we have more data, the output
becomes better.
- There is an optimization algorithm that incrementally works towards the best
outcome.

Different types of learning:
Starting points:
- Supervised learning: annotated/labelled dataset / ground truth
o Classification: discrete variable
o Regression: continuous variable
- Unsupervised learning: unlabeled dataset
o clustering

Examples:
Spam vs non-spam?




This is usually a problem of text mining. The emails have to be pre-processed in such a way
that we can create features from the dataset. This is a binary classification problem. The

,learning algorithm should come up with a function that matches the representation of the
emails.
- Find examples of spam and non-spam
- Come up with a learning algorithm
- A learning algorithm infers rules from examples: if (A or B or C) and not D, then spam
- These rules can then be applied to new data (emails)

Learning algorithms:
- See several different learning algorithms
- Implement 2-3 simple ones from scratch in Python
- Learn about Python libraries for ML (scikit-Learn)
- How to apply them to real-world problems

Machine learning examples:
- Recognize handwritten numbers and letters
- Recognize faces in photos
- Determine whether text expresses positive, negative or no opinion
- Guess person’s age based on a sample of writing
- Flag suspicious credit-card transactions
- Recommend books and movies to users based on their own and others’ purchase
history
- Recognize and label mentions of people’s or organization names in text

Types of learning problems:
Regression:
- Response: a (real) number
- Predict a person’s age
- Predict price of stock
- Predict student’s score on exam
Binary classification:
- Response: Yes/No answer
- Detect spam
- Predict polarity of product review: positive vs negative
Multiclass classification:
- Response: one of a finite set of options
- Classify newspaper article as:
o Politics, sports, science, technology, health, finance
- Detect species based on photo
o Passer domesticus, Calidris alba, Streptopelia, decaocto, corvus cornax
Multilabel classification:
- The output does not have to consist of a single thing, but it could be multiple things
(this is the difference with multiclass classification)
- Assign songs to one or more genres (rock, pop, metal)
- You are not trying to find all of the labels correctly, but you are trying to find the
most correct labels during training.
Autonomous behavior (example of a car)
- Input: measurements from sensors – camera, microphone, radar, accelerometer.

, - Response: instructions for actuators – steering, accelerator, brake.
- Evaluation: choose a baseline, choose a metric, compare!
- Different tasks, different metrics:
o Predicting age
o Flagging spam

Two metrics that we often use in regression problems:
- Mean absolute error – the average (absolute) difference between true value and
predicted value (yn true value (ground truth), ŷn predicted value)


- Mean squared error: the average square of the difference between true value and
predicted value – more sensitive to outlier, but it is differentiable (as opposed to
MAE)



For a binary classification problems, the metrics often used are:
- Accuracy
- Error rate
These are not really informative, especially if the database is not balanced.



Classification:
- False positive – flagged as spam, but not spam
- False negative – not flagged, but is spam
- False positives are a bigger issue for this problem!
- Ture positive – spam classified as spam
- Ture negative – not-spam classified as not-spam

Precision and recall:
- Metrics which focus on one kind of mistake
- Precision: what fraction of flagged emails were real spam?

- Recall: what fraction of real spams were flagged?


Example:

, Confusion matrix example:




f-score:
- Harmonic mean between precision and recall (a kind of average)


- Aka F-measure

Fβ :
- Parameter β quantifies how much more we care about recall than precision, when it
is greater than 1, that means, recall is weighted more, when it is smaller than 1, that
means precision is weighted more



Multiclass classification:
You can still make a confusion matrix with multiclass classification as well.




When there are more than two classes, you have to come up with alternatives when it
comes to rating the learning outcomes. You can use macro-average and micro-average.

Macro-average:
Precision true positive over labeled positives; recall, true positives over actual positives.
- You can only use this if the data is balanced.
- Compute precision and recall per-class, and average:

- Rare classes have the same impact as frequent classes

Micro-average:
- Gives every point equal importance (this is the difference from the macro-average).
- Micro averaging treats the entire set of data as an aggregate result, and calculates 1
metric rather than k metrics that get averaged together
$5.39
Get access to the full document:
Purchased by 51 students

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Reviews from verified buyers

Showing all 4 reviews
1 year ago

11 months ago

1 year ago

2 year ago

3.3

4 reviews

5
1
4
1
3
1
2
0
1
1
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
liekebuuron Avans Hogeschool
Follow You need to be logged in order to follow users or courses
Sold
170
Member since
5 year
Number of followers
103
Documents
15
Last sold
3 days ago

3.2

11 reviews

5
4
4
2
3
1
2
0
1
4

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions