Notizen

Complete WEEK1 note: Machine Learning & Learning Algorithms(BM05BAM)

Bewertung

Verkauft

seiten

Hochgeladen auf

12-03-2024

geschrieben in

2023/2024

THIS IS A COMPLETE NOTE FROM ALL BOOKS + LECTURE! Save your time for internships, other courses by studying over this note! Are you a 1st/2nd year of Business Analytics Management student at RSM, who want to survive the block 2 Machine Learning module? Are you overwhelmed with 30 pages of reading every week with brand-new terms and formulas? If you are lost in where to start or if you are struggling to keep up due to the other courses, or if you are just willing to learn about Machine Learning, I got you covered. I successfully passed the course Machine Learning & Learning Algorithms at RSM with 7.6, WITHOUT A TECHNICAL BACKGROUND before this Master. So if you are from Non-tech bachelor program, this note will navigate the knowledge you should focus on to pass the exam and successfully complete assignments, and for people with some machine learning knowledge, this note will certainly make your life easier and gets you a booster to your grade.

Mehr anzeigen Weniger lesen

Hochschule

Kurs

Inhaltsvorschau

HLM : Chapter 1 The Machine learning Landscape

Machine learning = field of study that gives computers the ability to learn without being
explicitly programmed.

Types of Machine learning system : added to the ISRL chapter 2
Main Challenges of Machine Learning:
Generalization
Generalization problem could be caused by sampling bias, overfitting.

It is crucial to use a training set that is representative of the cases you want to generalize
to. This is often harder than it sounds: if the sample is too small, you will have sampling
noise (i.e., nonrepresentative data as a result of chance/outlier/data errors)

However, even very large samples can be nonrepresentative if the sampling method is
flawed. This is called sampling bias.

Regularization: Constraining a model to make it simpler and reduce the risk of overfitting.
The amount of regularization to apply during learning can be controlled by a
hyperparameter. A hyperparameter is a parameter of a learning algorithm (not of the
model
- It must be set prior to training and remains constant during training.
- If you set the regularization hyperparameter to a very large value, you will get an
almost flat model (a slope close to zero)

Hyperparameter vs Parameter
- A model parameter is
o estimated during model training.
o internally optimized
- A hyperparameter must be
o specified before model training.
o optimized externally.

Concept drift: It happens when the relationship that model estimate changes after
training the model, due to the external conceptual change in the circumstance.

Testing and Validating:
A better option than testing on the new data is to split your data into two sets: the
training set and the test set, which allow you to test the performance before moving on
the actual practice. As these names imply, you train your model using the training set,
and you test it using the test set.
- It is common to use 80% of the data for training and hold out 20% for testing.
However, this depends on the size of the dataset:

, The error rate on new cases is called the generalization error (or out-of-sample error), and
by evaluating your model on the test set, you get an estimate of this error.

Hyperparameter Tuning and model selection
Suppose you are hesitating between two types of models (say, a linear model and a
polynomial model): how can you decide between them?

When you want to compare just two different models: just train models on the same
train data and compare the generalization performance with test data.
When you want to find the best performing hyperparameter among 100 options: You
cannot do the same.
- When you measure the generalization error multiple times on the test set, you
adapt the model and hyperparameters to produce the best model for that
particular set so it won’t perform as well on the new data.

A common solution to this problem is called holdout validation: you simply hold out part
of the training set to evaluate several candidate models and select the best one. The new
held-out set is called the validation set (or sometimes the development set, or dev set).

Process
1. You train multiple models with various hyperparameters on the reduced training
set.
2. You select the model that performs best on the validation set (holdout validation
process)
a. if the model performs poorly on the train-dev set, then it must have overfit
the training set, so you should try to simplify or regularize the model, get
more training data, and clean up the training data.
3. You train the best model on the full training set, including the validation set
4. Test the generalization error on the test set.

Validation set should not be too small: then model evaluations will be imprecise
Validation set should not be too large: remaining training set will be much smaller, which
would change the performance result after training on the full training set.

One way to solve this problem is Cross validation that uses small validation sets. Each
model is evaluated once per validation set after it is trained on the rest of the data. By
averaging the evaluations of the mode, you get much more accurate measure of
performance.
- It also means that training time is multiplied by the number of validation sets.

No Free Lunch Theorem
David Wolpert demonstrated that if you make absolutely no assumption about the data,
then there is no reason to prefer one model over any other. This is called the No Free
Lunch (NFL) theorem.

Urheberrechtsverletzung melden

Verknüpftes buch

Gareth James, Daniela Witten An Introduction to Statistical Learning

Edition: Unbekannt
ISBN:9781071614204
Ausgabe: Unbekannt

Schule, Studium & Fach

Hochschule: Erasmus Universiteit Rotterdam (EUR)
Studium: Business Analytics and Management
Kurs: Machine Learning & Learning Algorithms (BM05BAM)

Alle Dokumente für dieses Fach (7)

Dokument Information

Hochgeladen auf: 12. märz 2024
Anzahl der Seiten: 10
geschrieben in: 2023/2024
Typ: Notizen
Professor(en): Jason roos
Enthält: Alle klassen

Themen

machine learning
parametric
non parametric
supervised learning
unsupervised learning
regression
classification
statistical learning
islr2
islr
ml

12,99 €

Vollständigen Zugriff auf das Dokument erhalten:

Geschrieben von Student*innen, die bestanden haben

Sofort verfügbar nach Zahlung

Online lesen oder als PDF

Lerne den Verkäufer kennen

ArisMaya

4,0

(1)

Ebenfalls erhältlich im paket-deal

Lerne den Verkäufer kennen

ArisMaya Erasmus Universiteit Rotterdam

Profil betrachten

Folgen

Verkauft

Mitglied seit

4 Jahren

Anzahl der Follower

Dokumente

Zuletzt verkauft

5 Jahren vor

Let's Pass Together!

4,0

1 rezensionen

Kürzlich von dir angesehen.

Warum sich Studierende für Stuvia entscheiden

on Mitstudent*innen erstellt, durch Bewertungen verifiziert

Geschrieben von Student*innen, die bestanden haben und bewertet von anderen, die diese Studiendokumente verwendet haben.

Nicht zufrieden? Wähle ein anderes Dokument

Kein Problem! Du kannst direkt ein anderes Dokument wählen, das besser zu dem passt, was du suchst.

Bezahle wie du möchtest, fange sofort an zu lernen

Kein Abonnement, keine Verpflichtungen. Bezahle wie gewohnt per Kreditkarte oder Sofort und lade dein PDF-Dokument sofort herunter.

“Gekauft, heruntergeladen und bestanden. So einfach kann es sein.”

Alisha Student

Häufig gestellte Fragen

Was bekomme ich, wenn ich dieses Dokument kaufe?

Du erhältst eine PDF-Datei, die sofort nach dem Kauf verfügbar ist. Das gekaufte Dokument ist jederzeit, überall und unbegrenzt über dein Profil zugänglich.

Zufriedenheitsgarantie: Wie funktioniert das?

Unsere Zufriedenheitsgarantie sorgt dafür, dass du immer eine Lernunterlage findest, die zu dir passt. Du füllst ein Formular aus und unser Kundendienstteam kümmert sich um den Rest.

Wem kaufe ich diese Zusammenfassung ab?

Stuvia ist ein Marktplatz, du kaufst dieses Dokument also nicht von uns, sondern vom Verkäufer ArisMaya. Stuvia erleichtert die Zahlung an den Verkäufer.

Werde ich an ein Abonnement gebunden sein?

Nein, du kaufst diese Zusammenfassung nur für 12,99 €. Du bist nach deinem Kauf an nichts gebunden.

Kann man Stuvia trauen?

4.6 Sterne auf Google & Trustpilot (+1000 reviews) 45.681 Zusammenfassungen wurden in den letzten 30 Tagen verkauft Gegründet 2010, seit 16 Jahren die erste Adresse für Zusammenfassungen