100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary Complete exam material Data Science Methods, Master Econometrics and Master Data Science & Business Analytics, University of Amsterdam

Beoordeling
-
Verkocht
-
Pagina's
34
Geüpload op
16-10-2025
Geschreven in
2025/2026

Complete summary of the exam material for the course Data Science Methods in the Master Econometrics and Master Data Science and Business Analytics at the University of Amsterdam. The summary is in English. All lectures are in the summary, with extra information on some more complex topics.

Meer zien Lees minder
Instelling
Vak











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
16 oktober 2025
Aantal pagina's
34
Geschreven in
2025/2026
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Data Science Methods Overview CHoogteijling



Data Science Methods
Contents
1 Model Evaluation 3
1.1 Linear Models for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Generalization Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Bias-Variance Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Estimating the Expected Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 In-Sample Measures for Generalization Error: AIC and BIC . . . . . . . . . . . . . . . . . 5
1.6 K-fold Cross-Validation (CV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Shrinkage methods 8
2.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Dimension reduction 9
3.1 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Selecting the Number of Factors L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 PCA versus Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Nonparametric Regression: k-Nearest Neighbors and Kernel Regression 12
4.1 k-Nearest Neighbors Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 The MSE of the NW Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Local Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Linear Discriminant Analysis 15
5.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Decision Theory for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Linear Methods for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4 Linear Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.5 LDA for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.6 Reduced Rank LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.7 Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.8 QDA and Regularized Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.9 Model Evaluation applied to Classification Problems . . . . . . . . . . . . . . . . . . . . . 18

6 Logistic Regression and Stochastic Gradient Descent 19
6.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Training Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19



1

,Data Science Methods Overview CHoogteijling


6.3 Regularisation of Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.4 Comparison of Logistic Regression and LDA . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5 Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.6 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Clustering Methods 22
7.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

8 Bayesian Updating 24
8.1 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.2 Bayes Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.3 Bayesian Learning: Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.4 Bayesian Learning and Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

9 Model Averaging 26
9.1 Weighting Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2 Consistency and Asymptotic RMSE Optimality . . . . . . . . . . . . . . . . . . . . . . . . 28
9.3 Model Averaging for Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . 29

A Background 31
A.1 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.2 Rayleigh Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.3 Logarithm Cribsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.4 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.5 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.6 The Lagrangian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.7 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

B Test Questions 34




2

,Data Science Methods Overview CHoogteijling


1 Model Evaluation
Model performance is measured by how well a model generalizes. There are two potential objectives
for model evaluation, they can both play a role.

• Model selection is comparing the performance of different models to identify the best model.
• Model assessment is estimating the ability of a model to perform on new data.

In data-rich situations we can use train-val-test split, and in cases of insufficient data we can use cross-
validation.


1.1 Linear Models for Regression
Suppose we have p features X = (X1 , . . . , Xp )T in the feature space and the target variable Y . We
consider the regression model of the form
Y = f (X, β) + ε, with
M
X
f (X, β) = βm hm (X)
m=0
ε˜N (0, σε2 error term

• A linear regression model has basis functions hm (X), m = 1, . . . , M as the features, spanning an
M -dimensional feature space.

We set up the log-likelihood function to find the least squares problem and maximize to the noise variance:

1. We have the likelihood function, that helps determine the model parameters βj and σε . Where X
is the N × (M + 1) matrix with elements X nm = hm (xn ) and y = (y1 , . . . , yN )T .
N
Y
P (y | X, β, σε ) = N (yn | |f (xn , β), σε2 )
n=1

2. We take the logarithm of P (y | X, β, σε ), where ED (β) is the sum-of-squared-errors-function.
This shows that maximizing the likelihood with respect to the βm is equivalent to minimizing the
sum-of-squared-errors.
N ED (β)
ln P (y | X, β, σε ) = −N ln σε − ln(2π) −
2 σε2
N N
1X 2 1X
ED (β) = (yn − f (xn , β)) = (yn − β T h(xn ))2
2 n=1 2 n=1

3. We differentiate the log likelihood function with respect to βm .
N
1 X
∂βm ln P (y | X, β, σε ) = − (yn − β T h(xn ))(hm (xn )
σε2 n=1

4. We set these to zero for m = 0, . . . , M and solve for βm , then we have the normal equations for the
least squares problem.
β̂ = (X T X)−1 X T y = X + y
X + = (X T X)−1 X T Moore-Penrose inverse

5. We maximize the log likelihood function with respect to the noise variance σε2 .
N
1 X
σε2 = (yn − β̂ T h(xn ))
N n=1


3

, Data Science Methods Overview CHoogteijling


1.2 Generalization Error
We have the loss functions for a trained regression model fˆ(X):

L(Y, fˆ(X)) = (Y − fˆ(X))2 squared error
L(Y, fˆ(X)) − |Y − fˆ(X)| absolute error

The generalization error shows how well the model predicts responses for new data independently
drawn from the same population distribution. For the data set T = {(xn , yn )}N
n=1 .

errT = E(X,Y ) [L(Y, fˆ(X)) | T ]

The expected prediction error quantifies how well a predictive model is expected to perform on new,
unseen data.

err = ET ,(X,Y ) [L(Y, fˆ(X))]
err = ET [errT ]

The training error is the average loss on the set T the model was trained on.
N
1 X
err = L(yn , fˆ(xn ))
N n=1

• The prediction error is the average discrepancy between the model’s predictions and the true values
of the dependent variable for new observations.
• The prediction error is the expectation of the generalization error when averaged over all possi-
ble sets of observations T because the observations are drawn independently from the same joint
distribution as (X, Y ).
• The generalization error should be small to ensure low prediction error on unseen data.
• The generalization error can often not be estimated directly, so we use the estimate of the expected
prediction.
• The training error can never be an indicator of the generalization performance, as we can make the
training error arbitrarily small without improving generalization performance.
• Overfitting is when the model is too tailored to the specifics of the noise in the training set.


1.3 The Bias-Variance Decomposition
The prediction error can be decomposed into three terms: the bias (squared) of the estimated model,
plus the variance of the estimated model, plus the variance of the Gaussian noise.

• The bias term measures how much on average our estimated model deviates from the true mean,
given by the function f (X).
• The variance term is the expected (squared) deviation of the estimated model around its mean.
• The third term is an irreducible error, due to the inherent variance in the data-generating process
around its true mean f (X).


err[x0 ] = E[(Y − fˆ(X))2 | X = x0 ]
= (E[f (x0 )] − f (x0 ))2 + E[f (x0 ) − E[fˆ(x0 )]]2 + σε2
= bias2 (fˆ(x0 )) + Var(fˆ(x0 )) + σ 2ε
2
= bias + variance + σε2


4

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
charhoog Universiteit van Amsterdam
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
11
Lid sinds
2 jaar
Aantal volgers
6
Documenten
12
Laatst verkocht
6 maanden geleden

3.0

1 beoordelingen

5
0
4
0
3
1
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen