100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Samenvatting An Introduction to Statistical Learning, Basis

Rating
5.0
(1)
Sold
1
Pages
7
Uploaded on
24-06-2022
Written in
2020/2021

ISBN: 9781461471370 Big Data Analysis (7204MM17XY): H1 t/m 7

Institution
Course









Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Summarized whole book?
Unknown
Uploaded on
June 24, 2022
Number of pages
7
Written in
2020/2021
Type
Summary

Subjects

Content preview

Chapter 2
2 reasons to estimate F:

- Prediction
- Inference

Parametric methods

- Easy to estimate parameters in a linear function
- Model will usually not match the true unknown form of F

Non-parametric methods

- Avoids (wrong) assumption of functional form of F
- Large number of observations is required in order to obtain an accurate estimate of F

, Variance refers to the amount by which ˆf would change if we estimated it using a different training
data set

Bias refers to the error that is introduced by approximating a real-life problem, which may be
extremely complicated, by a much simpler model.

KNN neighbours
When K = 1, the decision boundary is overly flexible and finds patterns in the data that don’t
correspond to the Bayes decision boundary. This corresponds to a classifier that has low bias but very
high variance.



lowest possible test error rate, called the Bayes error rate.


Chapter 3




Curse of dimensionality: As the number of features/dimensions grows, the amount of data we need
to generalize accurately grows exponentially

Chapter 4
LDA to classify more than 2 classes


Why do we need another method, when we have logistic regression?
There are several reasons:

- When the classes are well-separated, the parameter estimates for the logistic regression
model are surprisingly unstable. Linear discriminant analysis does not suffer from this
problem.
- If n is small and the distribution of the predictors X is approximately normal in each of the
classes, the linear discriminant model is again more stable than the logistic regression model.


Check video’s LDA/QDA

sensitivity is the percentage of true defaulters that are identified
specificity is the percentage of non-defaulters that are correctly identified

LDA is a much less flexible classifier than QDA, and so has substantially lower variance.
LDA tends to be a better bet than QDA if there are relatively few training observations and so
reducing variance is crucial.
$3.98
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
Jonnez
5.0
(1)

Reviews from verified buyers

Showing all reviews
4 months ago

Great! Exactly what I needed. Well written, nicely summarized

4 months ago

Thank you Roos!

5.0

1 reviews

5
1
4
0
3
0
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Jonnez Arteveldehogeschool
Follow You need to be logged in order to follow users or courses
Sold
3
Member since
4 year
Number of followers
2
Documents
3
Last sold
4 months ago
Jonne Documenten

5.0

1 reviews

5
1
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions