Samenvatting

Summary Complete exam material Data Science Methods, Master Econometrics and Master Data Science & Business Analytics, University of Amsterdam

Beoordeling

Verkocht

Pagina's

Geüpload op

16-10-2025

Geschreven in

2025/2026

Complete summary of the exam material for the course Data Science Methods in the Master Econometrics and Master Data Science and Business Analytics at the University of Amsterdam. The summary is in English. All lectures are in the summary, with extra information on some more complex topics.

Meer zien Lees minder

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Universiteit van Amsterdam (UvA)
Studie: Econometrics
Vak: Data Science Methods

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 16 oktober 2025
Aantal pagina's: 34
Geschreven in: 2025/2026
Type: Samenvatting

Onderwerpen

econometrics
bayesian updating
classification
regression
data science
logistic regression
linear discriminant analysis
expectation maximization algorithm

Voorbeeld van de inhoud

Data Science Methods Overview CHoogteijling

Data Science Methods
Contents
1 Model Evaluation 3
1.1 Linear Models for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Generalization Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Bias-Variance Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Estimating the Expected Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 In-Sample Measures for Generalization Error: AIC and BIC . . . . . . . . . . . . . . . . . 5
1.6 K-fold Cross-Validation (CV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Shrinkage methods 8
2.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Dimension reduction 9
3.1 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Selecting the Number of Factors L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 PCA versus Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Nonparametric Regression: k-Nearest Neighbors and Kernel Regression 12
4.1 k-Nearest Neighbors Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 The MSE of the NW Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Local Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Linear Discriminant Analysis 15
5.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Decision Theory for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Linear Methods for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4 Linear Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.5 LDA for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.6 Reduced Rank LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.7 Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.8 QDA and Regularized Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.9 Model Evaluation applied to Classification Problems . . . . . . . . . . . . . . . . . . . . . 18

6 Logistic Regression and Stochastic Gradient Descent 19
6.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Training Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1

,Data Science Methods Overview CHoogteijling

6.3 Regularisation of Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.4 Comparison of Logistic Regression and LDA . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5 Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.6 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Clustering Methods 22
7.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

8 Bayesian Updating 24
8.1 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.2 Bayes Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.3 Bayesian Learning: Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.4 Bayesian Learning and Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

9 Model Averaging 26
9.1 Weighting Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2 Consistency and Asymptotic RMSE Optimality . . . . . . . . . . . . . . . . . . . . . . . . 28
9.3 Model Averaging for Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . 29

A Background 31
A.1 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.2 Rayleigh Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.3 Logarithm Cribsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.4 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.5 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.6 The Lagrangian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.7 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

B Test Questions 34

2

,Data Science Methods Overview CHoogteijling

1 Model Evaluation
Model performance is measured by how well a model generalizes. There are two potential objectives
for model evaluation, they can both play a role.

• Model selection is comparing the performance of different models to identify the best model.
• Model assessment is estimating the ability of a model to perform on new data.

In data-rich situations we can use train-val-test split, and in cases of insufficient data we can use cross-
validation.

1.1 Linear Models for Regression
Suppose we have p features X = (X1 , . . . , Xp )T in the feature space and the target variable Y . We
consider the regression model of the form
Y = f (X, β) + ε, with
M
X
f (X, β) = βm hm (X)
m=0
ε˜N (0, σε2 error term

• A linear regression model has basis functions hm (X), m = 1, . . . , M as the features, spanning an
M -dimensional feature space.

We set up the log-likelihood function to find the least squares problem and maximize to the noise variance:

1. We have the likelihood function, that helps determine the model parameters βj and σε . Where X
is the N × (M + 1) matrix with elements X nm = hm (xn ) and y = (y1 , . . . , yN )T .
N
Y
P (y | X, β, σε ) = N (yn | |f (xn , β), σε2 )
n=1

2. We take the logarithm of P (y | X, β, σε ), where ED (β) is the sum-of-squared-errors-function.
This shows that maximizing the likelihood with respect to the βm is equivalent to minimizing the
sum-of-squared-errors.
N ED (β)
ln P (y | X, β, σε ) = −N ln σε − ln(2π) −
2 σε2
N N
1X 2 1X
ED (β) = (yn − f (xn , β)) = (yn − β T h(xn ))2
2 n=1 2 n=1

3. We differentiate the log likelihood function with respect to βm .
N
1 X
∂βm ln P (y | X, β, σε ) = − (yn − β T h(xn ))(hm (xn )
σε2 n=1

4. We set these to zero for m = 0, . . . , M and solve for βm , then we have the normal equations for the
least squares problem.
β̂ = (X T X)−1 X T y = X + y
X + = (X T X)−1 X T Moore-Penrose inverse

5. We maximize the log likelihood function with respect to the noise variance σε2 .
N
1 X
σε2 = (yn − β̂ T h(xn ))
N n=1

3

, Data Science Methods Overview CHoogteijling

1.2 Generalization Error
We have the loss functions for a trained regression model fˆ(X):

L(Y, fˆ(X)) = (Y − fˆ(X))2 squared error
L(Y, fˆ(X)) − |Y − fˆ(X)| absolute error

The generalization error shows how well the model predicts responses for new data independently
drawn from the same population distribution. For the data set T = {(xn , yn )}N
n=1 .

errT = E(X,Y ) [L(Y, fˆ(X)) | T ]

The expected prediction error quantifies how well a predictive model is expected to perform on new,
unseen data.

err = ET ,(X,Y ) [L(Y, fˆ(X))]
err = ET [errT ]

The training error is the average loss on the set T the model was trained on.
N
1 X
err = L(yn , fˆ(xn ))
N n=1

• The prediction error is the average discrepancy between the model’s predictions and the true values
of the dependent variable for new observations.
• The prediction error is the expectation of the generalization error when averaged over all possi-
ble sets of observations T because the observations are drawn independently from the same joint
distribution as (X, Y ).
• The generalization error should be small to ensure low prediction error on unseen data.
• The generalization error can often not be estimated directly, so we use the estimate of the expected
prediction.
• The training error can never be an indicator of the generalization performance, as we can make the
training error arbitrarily small without improving generalization performance.
• Overfitting is when the model is too tailored to the specifics of the noise in the training set.

1.3 The Bias-Variance Decomposition
The prediction error can be decomposed into three terms: the bias (squared) of the estimated model,
plus the variance of the estimated model, plus the variance of the Gaussian noise.

• The bias term measures how much on average our estimated model deviates from the true mean,
given by the function f (X).
• The variance term is the expected (squared) deviation of the estimated model around its mean.
• The third term is an irreducible error, due to the inherent variance in the data-generating process
around its true mean f (X).

err[x0 ] = E[(Y − fˆ(X))2 | X = x0 ]
= (E[f (x0 )] − f (x0 ))2 + E[f (x0 ) − E[fˆ(x0 )]]2 + σε2
= bias2 (fˆ(x0 )) + Var(fˆ(x0 )) + σ 2ε
2
= bias + variance + σε2

4

$11.48

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

charhoog

3.0

(1)

Maak kennis met de verkoper

charhoog Universiteit van Amsterdam

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 jaar

Aantal volgers

Documenten

Laatst verkocht

6 maanden geleden

3.0

1 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper charhoog. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $11.48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 46231 samenvattingen verkocht Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen