100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Full summary: An Introduction to Statistical Learning - Statistical Learning (6013B0357Y) || UvA Econometrics and Data Science, Actuarial Science

Rating
-
Sold
-
Pages
20
Uploaded on
20-04-2025
Written in
2023/2024

This summary contains all you need to know for your Statistical Learning final from the book 'An Introduction to Statistical Learning'. It is meant for students in the BSc Econometrics and Data Science, BSc Actuarial Science, Minor Actuarial Science, Premaster Econometrics or Premaster Actuarial Science and Mathematical Finance.

Show more Read less
Institution
Module










Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Module

Document information

Summarized whole book?
No
Which chapters are summarized?
Hoofdstuk 2-8
Uploaded on
April 20, 2025
Number of pages
20
Written in
2023/2024
Type
Summary

Subjects

Content preview

2 Statistical Learning
Why estimate ƒ?
1. Prediction
●​ Make predictions of Y
●​
●​ The accuracy of Ŷ depends on
○​ Reducible error: ƒ̂(X)
○​ Irreducible error: ε

○​
2. Inference
●​ Understand the association between Y and X1, …, Xp
●​ Estimate ƒ
○​ Which predictors are associated with the response?
○​ What is the relationship between the response and each predictor?
○​ Is the relationship linear or more complicated?



Measuring the quality of fit
●​ For regression we use the mean squared error:
○​ Using the training set, we get the training MSE
○​ Test set → test MSE
○​ As flexibility increases, the training MSE goes down but the test MSE gets a
U-shape
■​ Overfitting



Bias-Variance Tradeoff
○​
■​ Variance: the amount by which ƒ̂ would change if we estimated it
using a different training set
●​ More flexible methods have higher variance
■​ Bias: the error that is introduced by approximating a real-life problem
●​ More flexible models have lower bias

,5 Resampling Methods
The Validation Set Approach
1.​ Randomly divide the available set of observations into a training set and a
validation set (hold-out set).
2.​ Fit the model to the training set.
3.​ Predict the responses in the validation set.
4.​ The resulting validation set error rate provides an estimate of the test error rate.

- The validation estimate of the test error can be highly variable, depending on which
observations are in which set.
- Only a subset of the observations are used to fit the model, but statistical methods tend to
perform worse when trained on fewer observations.

Leave-One-Out Cross-Validation (LOOCV)
1.​ Split the observations in two parts, the training set with all but one observation, and
the validation set with a single observation (x1, y1).
2.​ The method is fit on the n - 1 observations and the remaining observation is
predicted.
3.​ The MSE1 is calculated.
4.​ The procedure is n - 1 times with the other observations as validation sets.




+​ Far less bias than validation set.
+​ Will always yield the same results when repeated, no randomness in the set splits.
-​ Can be expensive, has to be fit n times. Can be time consuming for large n.

Shortcut using least squares or polynomial regression:




→ hi is the leverage

k-Fold Cross Validation
1.​ Randomly divide the set of observations into k groups (folds) of approximately equal
size.
2.​ The first fold is treated as a validation set, and the method is fit on the remaining k - 1
folds.
3.​ The MSE1 is calculated.
4.​ This procedure is repeated k times.
5.​ The k-fold DV estimate is calculated:

, +​ Shorter computation than LOOCV, since it has to be fitted k (= 5 or 10 usually) times
instead of n times.
+​ More accurate estimates of the test error rate than the LOOCV.
-​ There will be an intermediate level of bias (with k = 5 or k = 10, each set
contains (k - 1)n/k observations), more than LOOCV.
+​ LOOCV has higher variance than k-fold with k < n, because the folds are less
correlated with each other than the n - 1 datasets of LOOCV. The mean of
many highly correlated quantities has higher variance than the mean of many
quantiles that are not as highly correlated.
$7.66
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
eva6590

Get to know the seller

Seller avatar
eva6590 Universiteit van Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
7 months
Number of followers
0
Documents
8
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions