100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Full summary: An Introduction to Statistical Learning - Statistical Learning (6013B0357Y) || UvA Econometrics and Data Science, Actuarial Science

Beoordeling
-
Verkocht
-
Pagina's
20
Geüpload op
20-04-2025
Geschreven in
2023/2024

This summary contains all you need to know for your Statistical Learning final from the book 'An Introduction to Statistical Learning'. It is meant for students in the BSc Econometrics and Data Science, BSc Actuarial Science, Minor Actuarial Science, Premaster Econometrics or Premaster Actuarial Science and Mathematical Finance.

Meer zien Lees minder










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Heel boek samengevat?
Nee
Wat is er van het boek samengevat?
Hoofdstuk 2-8
Geüpload op
20 april 2025
Aantal pagina's
20
Geschreven in
2023/2024
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

2 Statistical Learning
Why estimate ƒ?
1. Prediction
●​ Make predictions of Y
●​
●​ The accuracy of Ŷ depends on
○​ Reducible error: ƒ̂(X)
○​ Irreducible error: ε

○​
2. Inference
●​ Understand the association between Y and X1, …, Xp
●​ Estimate ƒ
○​ Which predictors are associated with the response?
○​ What is the relationship between the response and each predictor?
○​ Is the relationship linear or more complicated?



Measuring the quality of fit
●​ For regression we use the mean squared error:
○​ Using the training set, we get the training MSE
○​ Test set → test MSE
○​ As flexibility increases, the training MSE goes down but the test MSE gets a
U-shape
■​ Overfitting



Bias-Variance Tradeoff
○​
■​ Variance: the amount by which ƒ̂ would change if we estimated it
using a different training set
●​ More flexible methods have higher variance
■​ Bias: the error that is introduced by approximating a real-life problem
●​ More flexible models have lower bias

,5 Resampling Methods
The Validation Set Approach
1.​ Randomly divide the available set of observations into a training set and a
validation set (hold-out set).
2.​ Fit the model to the training set.
3.​ Predict the responses in the validation set.
4.​ The resulting validation set error rate provides an estimate of the test error rate.

- The validation estimate of the test error can be highly variable, depending on which
observations are in which set.
- Only a subset of the observations are used to fit the model, but statistical methods tend to
perform worse when trained on fewer observations.

Leave-One-Out Cross-Validation (LOOCV)
1.​ Split the observations in two parts, the training set with all but one observation, and
the validation set with a single observation (x1, y1).
2.​ The method is fit on the n - 1 observations and the remaining observation is
predicted.
3.​ The MSE1 is calculated.
4.​ The procedure is n - 1 times with the other observations as validation sets.




+​ Far less bias than validation set.
+​ Will always yield the same results when repeated, no randomness in the set splits.
-​ Can be expensive, has to be fit n times. Can be time consuming for large n.

Shortcut using least squares or polynomial regression:




→ hi is the leverage

k-Fold Cross Validation
1.​ Randomly divide the set of observations into k groups (folds) of approximately equal
size.
2.​ The first fold is treated as a validation set, and the method is fit on the remaining k - 1
folds.
3.​ The MSE1 is calculated.
4.​ This procedure is repeated k times.
5.​ The k-fold DV estimate is calculated:

, +​ Shorter computation than LOOCV, since it has to be fitted k (= 5 or 10 usually) times
instead of n times.
+​ More accurate estimates of the test error rate than the LOOCV.
-​ There will be an intermediate level of bias (with k = 5 or k = 10, each set
contains (k - 1)n/k observations), more than LOOCV.
+​ LOOCV has higher variance than k-fold with k < n, because the folds are less
correlated with each other than the n - 1 datasets of LOOCV. The mean of
many highly correlated quantities has higher variance than the mean of many
quantiles that are not as highly correlated.
€6,33
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
eva6590

Maak kennis met de verkoper

Seller avatar
eva6590 Universiteit van Amsterdam
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
0
Lid sinds
7 maanden
Aantal volgers
0
Documenten
8
Laatst verkocht
-

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen