Summary

Full summary: An Introduction to Statistical Learning - Statistical Learning (6013B0357Y) || UvA Econometrics and Data Science, Actuarial Science

Rating

Sold

Pages

Uploaded on

20-04-2025

Written in

2023/2024

This summary contains all you need to know for your Statistical Learning final from the book 'An Introduction to Statistical Learning'. It is meant for students in the BSc Econometrics and Data Science, BSc Actuarial Science, Minor Actuarial Science, Premaster Econometrics or Premaster Actuarial Science and Mathematical Finance.

Show more Read less

Institution

Module

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Connected book

Gareth James, Daniela Witten An Introduction to Statistical Learning

Edition:Unknown
ISBN:9781071614174
Edition:Unknown

Written for

Institution: Universiteit van Amsterdam (UvA)
Study: Econometrics and Data Science
Module: Statistical Learning (6013B0357Y)

All documents for this subject (1)

Document information

Summarized whole book?: No
Which chapters are summarized?: Hoofdstuk 2-8
Uploaded on: April 20, 2025
Number of pages: 20
Written in: 2023/2024
Type: Summary

Subjects

classification
discriminant analysis
logistic regression
confusion matrix
bias variance tradeoff and cross validation
model selection in high dimensional regression
shrinkage and dimension reduction

Content preview

2 Statistical Learning
Why estimate ƒ?
1. Prediction
● Make predictions of Y
●
● The accuracy of Ŷ depends on
○ Reducible error: ƒ̂(X)
○ Irreducible error: ε

○
2. Inference
● Understand the association between Y and X1, …, Xp
● Estimate ƒ
○ Which predictors are associated with the response?
○ What is the relationship between the response and each predictor?
○ Is the relationship linear or more complicated?

Measuring the quality of fit
● For regression we use the mean squared error:
○ Using the training set, we get the training MSE
○ Test set → test MSE
○ As flexibility increases, the training MSE goes down but the test MSE gets a
U-shape
■ Overfitting

Bias-Variance Tradeoff
○
■ Variance: the amount by which ƒ̂ would change if we estimated it
using a different training set
● More flexible methods have higher variance
■ Bias: the error that is introduced by approximating a real-life problem
● More flexible models have lower bias

,5 Resampling Methods
The Validation Set Approach
1. Randomly divide the available set of observations into a training set and a
validation set (hold-out set).
2. Fit the model to the training set.
3. Predict the responses in the validation set.
4. The resulting validation set error rate provides an estimate of the test error rate.

- The validation estimate of the test error can be highly variable, depending on which
observations are in which set.
- Only a subset of the observations are used to fit the model, but statistical methods tend to
perform worse when trained on fewer observations.

Leave-One-Out Cross-Validation (LOOCV)
1. Split the observations in two parts, the training set with all but one observation, and
the validation set with a single observation (x1, y1).
2. The method is fit on the n - 1 observations and the remaining observation is
predicted.
3. The MSE1 is calculated.
4. The procedure is n - 1 times with the other observations as validation sets.

+ Far less bias than validation set.
+ Will always yield the same results when repeated, no randomness in the set splits.
- Can be expensive, has to be fit n times. Can be time consuming for large n.

Shortcut using least squares or polynomial regression:

→ hi is the leverage

k-Fold Cross Validation
1. Randomly divide the set of observations into k groups (folds) of approximately equal
size.
2. The first fold is treated as a validation set, and the method is fit on the remaining k - 1
folds.
3. The MSE1 is calculated.
4. This procedure is repeated k times.
5. The k-fold DV estimate is calculated:

, + Shorter computation than LOOCV, since it has to be fitted k (= 5 or 10 usually) times
instead of n times.
+ More accurate estimates of the test error rate than the LOOCV.
- There will be an intermediate level of bias (with k = 5 or k = 10, each set
contains (k - 1)n/k observations), more than LOOCV.
+ LOOCV has higher variance than k-fold with k < n, because the folds are less
correlated with each other than the n - 1 datasets of LOOCV. The mean of
many highly correlated quantities has higher variance than the mean of many
quantiles that are not as highly correlated.

$7.66

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

eva6590

Get to know the seller

eva6590 Universiteit van Amsterdam

View profile

Sold

Member since

7 months

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller eva6590. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.66. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45158 documents were sold in the last 30 days Founded in 2010, the go-to place to buy revision notes and other study material for 15 years now

Full summary: An Introduction to Statistical Learning - Statistical Learning (6013B0357Y) || UvA Econometrics and Data Science, Actuarial Science

Connected book

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning straight away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?