exp − 12 (x − µ )> Σ−1 (x − µ )
h i
>
Auxiliary: ΣX,Y := E (X − E [X]) (Y − E [Y ]) , fX1 ,...,Xp (x) = p ,
(2π)p |Σ|
e−λ λk
P(Y = k) =
h k! i
Intro: E (y0 − fˆ(x0 ))2 = Var(fˆ(x0 )) + Bias(fˆ(x0 ))2 + Var(),
OLS: β̂ = (X > X)−1 X > y
2p(p + 1)
Model selection: AIC = 2p − 2 log(L), AICc = AIC + , BIC = log(n)p − 2 log(L)
n−p−1
p
X −1 ∗> ∗
Ridge: β̂0 = ȳ − βj x̄(j) , β̂ ∗ = [β1 , · · · , βp ]> = X ∗> X ∗ + λI X y , where x∗i,j = xi,j − x̄(j) ,
j=1
n n
1X 1X
x̄(j) = xi,j , yi∗ = yi − ȳ, ȳ = yi
n n
i=1 i=1
Bayes: f (β|X, y) ∝ f (y|β, X)p(β), β̂ = arg max f (β|X, y)
β
K X
X d
Step: Cj (X) = 1{cj ≤X<cj+1 } , j = 0, . . . , K c0 = −∞, cK+1 = +∞, Y = βj,` Cj (X)X ` +
j=0 `=0
3
X K+3
X
Splines: f (x) = β j xj + βj bj (x), bj (x) := h(x, cj−3 ), h(x, ξ) = (x − ξ)3+
j=0 j=4
yi θi − b(θi )
GLMs: g (E [Yi |Xi ]) = Xi> β, f (yi ; θi , φ) = exp + c(yi , φ) , θi = g(E[Yi ]) canonical
ai (φ)
FY (Y ) ∼ Uniform[0, 1],
Xp
GAMs: g (E[Yi |Xi ]) = fj (Xi,j ),
j=1
Classification: Ŷ = arg max P (Y = c|X = x), 1 − max P (Y = c|X = x)
c∈C c∈C
LDA: P(Y = c|X = x0 ) ∝ φc (x0 )πc , πc = P (Y = c), φc (xi ) := fX|Y (xi |c)
1 > −1
ŷ0 = argmax cc + b>
c x0 , where cc := log(π̂c ) − 2 µ̂ µc , b>
µc Σ̂ µ̂ µ>
c = µ̂ c Σ̂
−1
c∈C
n
1X card(Sc ) 1 X
Σ̂j,k = x̃i,j x̃i,k , x̃i := xi − µ̂
µyi , π̂c = µc =
, µ̂ xi .
n n card(Sc )
i=1 i∈Sc
QDA: ŷ0 = argmax c̃c + b̃> > 1 −1 >
µ>
c x0 + x0 Ãc x0 , where Ãc := − 2 Σ̂c , b̃c := µ̂
−1
c Σ̂c ,
c∈C
1
µc )>
X
µ>
c̃c := log(π̂c ) − 2 log |Σ̂c | − 12 µ̂
1 −1 µ , Σ̂ =
c Σ̂c µ̂ c c (xi − µ̂
µc ) (xi − µ̂
card(Sc ) − 1
i∈Sc
z p
e X
Logistic: ζ(z) = , P (Yi = 1|Xi = xi ) = E [Yi |Xi = xi ] = ζ β0 + βj xi,j ,
1 + ez
j=1
Loss: ŷ0 = argmin E [L(Y, c)|X = x0 ]
c∈C
1