Solution Manual For
The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie,
and Robert Tibshirani John L. Weatherwax∗ David Epstein†
Chapter 2 (Overview of Supervised Learning)
Statistical Decision Theory
We assume a linear model: that is we assume y = f (x) + ε, where ε is a random variable
with mean 0 and variance ς2, and f (x) = xT β. Our expected predicted error (EPE) under
the squared error loss is
∫
EPE(β) = (y — xT β)2Pr(dx, dy) . (1)
on Real-World ApplicationFuture business exams may shift towards more experiential learning formats, such as simulations, interactive case studies, or project-based assessments. These
methods can better replicate the conditions of the modern business environment, where decision-making often requires collaboration, creativity, and adaptability. Digital tools and platforms
that offer real-time feedback and interactive elements may become more common in business exams.7.2. Law Exams: Integrating Technology and Legal PracticeFor law exams, the use of
technology is increasingly being integrated into the assessment process. Online exams, digital research tools, and case management software could be used to simulate the real-world legal
environment. The ability to use legal research tools or reference materials during the exam may become a norm, mimicking how legal professionals work. Additionally, law schools are
incorporating experiential learning opportunities, like moot courts and internships, into the curriculum, which may be reflected in future assessment methods.7.3. Mathematics Exams:
Incorporating Technology and ApplicationIn mathematics, the integration of technology is likely to continue to evolve. The use of computational tools like graphing calculators or mathematical
software can assist students in solving complex problems. Future mathematics exams may move beyond just pen-and-paper methods to include digital tools that allow for deeper exploration
of mathematical concepts. Additionally, the focus may shift towards more application-based questions, where students must use mathematics to solve real-world problems, such as those
encountered in data science or machine learning.ConclusionExams in business, law, and mathematics each serve unique purposes, testing a wide range of skills necessary for success in their
respective fields. While business exams emphasize the application of knowledge to real-world scenarios, law exams focus on legal analysis and reasoning, and mathematics exams challenge
students’ problem-solving and conceptual understanding. As these fields evolve, so too will their examination formats, with greater emphasis on technology and practical skills. Despite the
We regard this expression as a function of β, a column vector of length p + 1. In order to
find the value of β for which it is minimized, we equate to zero the vector derivative with
respect to β. We have
∫ ∫
∂EPE
= 2 (y — x β) (—1)x Pr(dx, dy) = —2 (y — xT β)xPr(dx, dy) .
T
(2)
∂β
Now this expression has two parts. The first has integrand yx and the second has integrand
(xT β)x.
Before proceeding, we make a quick general remark about matrices. Suppose that A, B and
C are matrices of size 1 p matrix, p 1 and q 1 respectively, where p and q are positive
integers. Then AB can be regarded as a scalar, and we have (AB)C = C(AB), each of these
expressions meaning that each component of C is multiplied by the scalar AB. If q > 1,
the expressions BC, A(BC) and ABC are meaningless, and we must avoid writing them.
On the other hand, CAB is meaningful as a product of three matrices, and the result is the
q × 1 matrix (AB)C = C(AB) = CAB. In our situation we obtain (xT β)x = xxT β.
From ∂EPE/∂β = 0 we deduce
E[yx] — E[xxT β] = 0 (3)
for the particular value of β that minimizes the EPE. Since this value of β is a constant, it
can be taken out of the expectation to give
β = E[xxT ]−1E[yx] , (4)
1
, which gives Equation 2.16 in the book.
We now discuss some points around Equations 2.26 and 2.27. We have
βˆ = (XT X)−1XT y = (XT X)−1XT (Xβ + ε) = β + (XT X)−1XT ε.
So
yˆ0 = xT βˆ = xT β + xT (XT X)−1XT ε. (5)
0 0 0
This immediately gives N
Σ
T
yˆ0 = x0 β + ℓi(x0)εi
i=1
on Real-World ApplicationFuture business exams may shift towards more experiential learning formats, such as simulations, interactive case studies, or project-based assessments. These methods can better replicate
the conditions of the modern business environment, where decision-making often requires collaboration, creativity, and adaptability. Digital tools and platforms that offer real-time feedback and interactive elements
may become more common in business exams.7.2. Law Exams: Integrating Technology and Legal PracticeFor law exams, the use of technology is increasingly being integrated into the assessment process. Online
exams, digital research tools, and case management software could be used to simulate the real-world legal environment. The ability to use legal research tools or reference materials during the exam may become a
norm, mimicking how legal professionals work. Additionally, law schools are incorporating experiential learning opportunities, like moot courts and internships, into the curriculum, which may be reflected in future
assessment methods.7.3. Mathematics Exams: Incorporating Technology and ApplicationIn mathematics, the integration of technology is likely to continue to evolve. The use of computational tools like graphing
calculators or mathematical software can assist students in solving complex problems. Future mathematics exams may move beyond just pen-and-paper methods to include digital tools that allow for deeper
exploration of mathematical concepts. Additionally, the focus may shift towards more application-based questions, where students must use mathematics to solve real-world problems, such as those encountered in
data science or machine learning.ConclusionExams in business, law, and mathematics each serve unique purposes, testing a wide range of skills necessary for success in their respective fields. While business exams
emphasize the application of knowledge to real-world scenarios, law exams focus on legal analysis and reasoning, and mathematics exams challenge students’ problem-solving and conceptual understanding. As these
fields evolve, so too will their examination formats, with greater emphasis on technology and practical skills. Despite the
2
,In order to prove Equations 2.27 and 2.28, we need to square the expression in Equation 6
and then apply various expectation operators. First we consider the properties of each of
the three terms, Ui in Equation 6. We have Ey0|x0 U1 = 0 and ET U1 = U1ET . Despite the
notation, yˆ0 does not involve y0. So Ey0|x0 U2 = U2Ey0|x0 and clearly ET U2 = 0. Equation 5
gives
U3 = ET (yˆ0 ) —0xT β = 0xT EX (XT X)−1XT EY|X ε = 0 (7)
since the expectation of the length N vector ε is zero. This shows that the bias U3 is zero.
We now square the remaining part of Equation 6 and then then apply Ey0|x0 ET . The cross-
term U1U2 gives zero, since Ey0|x0 (U1U2) = U2Ey0|x0 (U1) = 0. (This works in the same way
if ET replaces Ey0|x0 .)
on Real-World ApplicationFuture business exams may shift towards more experiential learning formats, such as simulations, interactive case studies, or project-based assessments. These
methods can better replicate the conditions of the modern business environment, where decision-making often requires collaboration, creativity, and adaptability. Digital tools and platforms
that offer real-time feedback and interactive elements may become more common in business exams.7.2. Law Exams: Integrating Technology and Legal PracticeFor law exams, the use of
technology is increasingly being integrated into the assessment process. Online exams, digital research tools, and case management software could be used to simulate the real-world legal
environment. The ability to use legal research tools or reference materials during the exam may become a norm, mimicking how legal professionals work. Additionally, law schools are
incorporating experiential learning opportunities, like moot courts and internships, into the curriculum, which may be reflected in future assessment methods.7.3. Mathematics Exams:
Incorporating Technology and ApplicationIn mathematics, the integration of technology is likely to continue to evolve. The use of computational tools like graphing calculators or mathematical
software can assist students in solving complex problems. Future mathematics exams may move beyond just pen-and-paper methods to include digital tools that allow for deeper exploration
of mathematical concepts. Additionally, the focus may shift towards more application-based questions, where students must use mathematics to solve real-world problems, such as those
encountered in data science or machine learning.ConclusionExams in business, law, and mathematics each serve unique purposes, testing a wide range of skills necessary for success in their
respective fields. While business exams emphasize the application of knowledge to real-world scenarios, law exams focus on legal analysis and reasoning, and mathematics exams challenge
students’ problem-solving and conceptual understanding. As these fields evolve, so too will their examination formats, with greater emphasis on technology and practical skills. Despite the
We are left with two squared terms, and the definition of variance enables us to deal imme-
diately with the first of these: Ey |x ET U12 = Var(y0|x0) = ς2. It remains to deal with the
term ET (yˆ0 — ET ŷ 0 ) 2 = VarT ( ŷ 0 ) in Equation 2.27. Since the bias U3 = 0, we know that
0
If m is the 1 1-matrix with entry µ, then mmT is the 1 1-matrix with enty µ2. It follows
from Equation 5 that the variance term in which we are interested is equal to
T T −1 T T T −1
0
Since ET = EX EY|X , and the expectation of εεT is ς2IN , this is equal to
−1
ς2x0T ET (XT X)−1 x 0 = ς2xT0E X T X/N x 0/N. (8)
We suppose, as stated by the authors, that the mean of the distribution giving rise to X
and x0 is zero. For large N, X T X/N is then approximately equal to Cov(X) = Cov(x0),
the p × p-matrix-variance-covariance matrix relating the p components of a typical sample
vector x—as far as EX is concerned, this is a constant. Applying Ex0 to Equation 8 as in
Equation 2.28, we obtain (approximately)
−1 −1
ς2Ex T
0
2
0
T
0
−1
= ς2Ex T
0
= ς2trace Cov(X)−1Cov(x0) /N (9)
= ς2trace(Ip)/N
= ς2p/N.
3
, This completes our discussion of Equations 2.27 and 2.28.
Notes on Local Methods in High Dimensions
The most common error metric used to compare different predictions of the true (but un-
known) mapping function value f (x0) is the mean square error (MSE). The unknown in the
above discussion is the specific function mapping function f (· ) which can be obtained via
different methods many of which are discussed in the book. In supervised learning to help
with the construction of an appropriate prediction yˆ0 we have access to a set of “training
samples” that contains the notion of randomness in that these points are not under complete
control of the experimenter. One could ask the question as to how much square error at our
predicted input point x0 will have on average when we consider all possible training sets .
We can compute, by inserting the “expected value of the predictor obtained over all training
sets”, ET ( ŷ 0 ) into the definition of quadratic (MSE) error as
MSE(x0) = ET [f (x0) — yˆ0 ] 2
= ET [yˆ0 — ET (ŷ 0 ) + ET (ŷ 0 ) — f (x0)] 2
= ET [(yˆ0 — ET (ŷ 0 )) + 2(yˆ0 — ET (ŷ0 ))(ET (ŷ 0 ) — f (x0)) + (ET (ŷ 0 ) — f (x0)) ]
= ET [(yˆ0 — ET (ŷ 0 )) ] + (ET (ŷ 0 ) — f (x0)) .
on Real-World ApplicationFuture business exams may shift towards more experiential learning formats, such as simulations, interactive case studies, or project-based assessments. These
methods can better replicate the conditions of the modern business environment, where decision-making often requires collaboration, creativity, and adaptability. Digital tools and platforms
that offer real-time feedback and interactive elements may become more common in business exams.7.2. Law Exams: Integrating Technology and Legal PracticeFor law exams, the use of
technology is increasingly being integrated into the assessment process. Online exams, digital research tools, and case management software could be used to simulate the real-world legal
environment. The ability to use legal research tools or reference materials during the exam may become a norm, mimicking how legal professionals work. Additionally, law schools are
incorporating experiential learning opportunities, like moot courts and internships, into the curriculum, which may be reflected in future assessment methods.7.3. Mathematics Exams:
Incorporating Technology and ApplicationIn mathematics, the integration of technology is likely to continue to evolve. The use of computational tools like graphing calculators or mathematical
software can assist students in solving complex problems. Future mathematics exams may move beyond just pen-and-paper methods to include digital tools that allow for deeper exploration
of mathematical concepts. Additionally, the focus may shift towards more application-based questions, where students must use mathematics to solve real-world problems, such as those
encountered in data science or machine learning.ConclusionExams in business, law, and mathematics each serve unique purposes, testing a wide range of skills necessary for success in their
respective fields. While business exams emphasize the application of knowledge to real-world scenarios, law exams focus on legal analysis and reasoning, and mathematics exams challenge
students’ problem-solving and conceptual understanding. As these fields evolve, so too will their examination formats, with greater emphasis on technology and practical skills. Despite the
Where we have expanded the quadratic, distributed the expectation across all terms, and
noted that the middle term vanishes since it is equal to
ET [2(yˆ0 — ET (ŷ0 ))(ET (ŷ 0 ) — f (x0))] = 0 ,
because ET ( ŷ 0 ) — ET (ŷ 0 ) = 0. and we are left with
MSE(x0) = ET [(yˆ0 — ET (ŷ 0 ))2 ] + (ET (ŷ 0 ) — f (x0))2 . (10)
The first term in the above expression ET [(yˆ0 — ET (ŷ 0 )) 2 ] is the variance of our estimator yˆ0
and the second term (ET (ŷ 0 ) — f (x0)) 2 is the bias (squared) of our estimator. This notion
of variance and bias with respect to our estimate yˆ0 is to be understood relative to possible
training sets, T , and the specific computational method used in computing the estimate yˆ0
given that training set.
Exercise Solutions
Ex. 2.1 (target coding)
4
The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie,
and Robert Tibshirani John L. Weatherwax∗ David Epstein†
Chapter 2 (Overview of Supervised Learning)
Statistical Decision Theory
We assume a linear model: that is we assume y = f (x) + ε, where ε is a random variable
with mean 0 and variance ς2, and f (x) = xT β. Our expected predicted error (EPE) under
the squared error loss is
∫
EPE(β) = (y — xT β)2Pr(dx, dy) . (1)
on Real-World ApplicationFuture business exams may shift towards more experiential learning formats, such as simulations, interactive case studies, or project-based assessments. These
methods can better replicate the conditions of the modern business environment, where decision-making often requires collaboration, creativity, and adaptability. Digital tools and platforms
that offer real-time feedback and interactive elements may become more common in business exams.7.2. Law Exams: Integrating Technology and Legal PracticeFor law exams, the use of
technology is increasingly being integrated into the assessment process. Online exams, digital research tools, and case management software could be used to simulate the real-world legal
environment. The ability to use legal research tools or reference materials during the exam may become a norm, mimicking how legal professionals work. Additionally, law schools are
incorporating experiential learning opportunities, like moot courts and internships, into the curriculum, which may be reflected in future assessment methods.7.3. Mathematics Exams:
Incorporating Technology and ApplicationIn mathematics, the integration of technology is likely to continue to evolve. The use of computational tools like graphing calculators or mathematical
software can assist students in solving complex problems. Future mathematics exams may move beyond just pen-and-paper methods to include digital tools that allow for deeper exploration
of mathematical concepts. Additionally, the focus may shift towards more application-based questions, where students must use mathematics to solve real-world problems, such as those
encountered in data science or machine learning.ConclusionExams in business, law, and mathematics each serve unique purposes, testing a wide range of skills necessary for success in their
respective fields. While business exams emphasize the application of knowledge to real-world scenarios, law exams focus on legal analysis and reasoning, and mathematics exams challenge
students’ problem-solving and conceptual understanding. As these fields evolve, so too will their examination formats, with greater emphasis on technology and practical skills. Despite the
We regard this expression as a function of β, a column vector of length p + 1. In order to
find the value of β for which it is minimized, we equate to zero the vector derivative with
respect to β. We have
∫ ∫
∂EPE
= 2 (y — x β) (—1)x Pr(dx, dy) = —2 (y — xT β)xPr(dx, dy) .
T
(2)
∂β
Now this expression has two parts. The first has integrand yx and the second has integrand
(xT β)x.
Before proceeding, we make a quick general remark about matrices. Suppose that A, B and
C are matrices of size 1 p matrix, p 1 and q 1 respectively, where p and q are positive
integers. Then AB can be regarded as a scalar, and we have (AB)C = C(AB), each of these
expressions meaning that each component of C is multiplied by the scalar AB. If q > 1,
the expressions BC, A(BC) and ABC are meaningless, and we must avoid writing them.
On the other hand, CAB is meaningful as a product of three matrices, and the result is the
q × 1 matrix (AB)C = C(AB) = CAB. In our situation we obtain (xT β)x = xxT β.
From ∂EPE/∂β = 0 we deduce
E[yx] — E[xxT β] = 0 (3)
for the particular value of β that minimizes the EPE. Since this value of β is a constant, it
can be taken out of the expectation to give
β = E[xxT ]−1E[yx] , (4)
1
, which gives Equation 2.16 in the book.
We now discuss some points around Equations 2.26 and 2.27. We have
βˆ = (XT X)−1XT y = (XT X)−1XT (Xβ + ε) = β + (XT X)−1XT ε.
So
yˆ0 = xT βˆ = xT β + xT (XT X)−1XT ε. (5)
0 0 0
This immediately gives N
Σ
T
yˆ0 = x0 β + ℓi(x0)εi
i=1
on Real-World ApplicationFuture business exams may shift towards more experiential learning formats, such as simulations, interactive case studies, or project-based assessments. These methods can better replicate
the conditions of the modern business environment, where decision-making often requires collaboration, creativity, and adaptability. Digital tools and platforms that offer real-time feedback and interactive elements
may become more common in business exams.7.2. Law Exams: Integrating Technology and Legal PracticeFor law exams, the use of technology is increasingly being integrated into the assessment process. Online
exams, digital research tools, and case management software could be used to simulate the real-world legal environment. The ability to use legal research tools or reference materials during the exam may become a
norm, mimicking how legal professionals work. Additionally, law schools are incorporating experiential learning opportunities, like moot courts and internships, into the curriculum, which may be reflected in future
assessment methods.7.3. Mathematics Exams: Incorporating Technology and ApplicationIn mathematics, the integration of technology is likely to continue to evolve. The use of computational tools like graphing
calculators or mathematical software can assist students in solving complex problems. Future mathematics exams may move beyond just pen-and-paper methods to include digital tools that allow for deeper
exploration of mathematical concepts. Additionally, the focus may shift towards more application-based questions, where students must use mathematics to solve real-world problems, such as those encountered in
data science or machine learning.ConclusionExams in business, law, and mathematics each serve unique purposes, testing a wide range of skills necessary for success in their respective fields. While business exams
emphasize the application of knowledge to real-world scenarios, law exams focus on legal analysis and reasoning, and mathematics exams challenge students’ problem-solving and conceptual understanding. As these
fields evolve, so too will their examination formats, with greater emphasis on technology and practical skills. Despite the
2
,In order to prove Equations 2.27 and 2.28, we need to square the expression in Equation 6
and then apply various expectation operators. First we consider the properties of each of
the three terms, Ui in Equation 6. We have Ey0|x0 U1 = 0 and ET U1 = U1ET . Despite the
notation, yˆ0 does not involve y0. So Ey0|x0 U2 = U2Ey0|x0 and clearly ET U2 = 0. Equation 5
gives
U3 = ET (yˆ0 ) —0xT β = 0xT EX (XT X)−1XT EY|X ε = 0 (7)
since the expectation of the length N vector ε is zero. This shows that the bias U3 is zero.
We now square the remaining part of Equation 6 and then then apply Ey0|x0 ET . The cross-
term U1U2 gives zero, since Ey0|x0 (U1U2) = U2Ey0|x0 (U1) = 0. (This works in the same way
if ET replaces Ey0|x0 .)
on Real-World ApplicationFuture business exams may shift towards more experiential learning formats, such as simulations, interactive case studies, or project-based assessments. These
methods can better replicate the conditions of the modern business environment, where decision-making often requires collaboration, creativity, and adaptability. Digital tools and platforms
that offer real-time feedback and interactive elements may become more common in business exams.7.2. Law Exams: Integrating Technology and Legal PracticeFor law exams, the use of
technology is increasingly being integrated into the assessment process. Online exams, digital research tools, and case management software could be used to simulate the real-world legal
environment. The ability to use legal research tools or reference materials during the exam may become a norm, mimicking how legal professionals work. Additionally, law schools are
incorporating experiential learning opportunities, like moot courts and internships, into the curriculum, which may be reflected in future assessment methods.7.3. Mathematics Exams:
Incorporating Technology and ApplicationIn mathematics, the integration of technology is likely to continue to evolve. The use of computational tools like graphing calculators or mathematical
software can assist students in solving complex problems. Future mathematics exams may move beyond just pen-and-paper methods to include digital tools that allow for deeper exploration
of mathematical concepts. Additionally, the focus may shift towards more application-based questions, where students must use mathematics to solve real-world problems, such as those
encountered in data science or machine learning.ConclusionExams in business, law, and mathematics each serve unique purposes, testing a wide range of skills necessary for success in their
respective fields. While business exams emphasize the application of knowledge to real-world scenarios, law exams focus on legal analysis and reasoning, and mathematics exams challenge
students’ problem-solving and conceptual understanding. As these fields evolve, so too will their examination formats, with greater emphasis on technology and practical skills. Despite the
We are left with two squared terms, and the definition of variance enables us to deal imme-
diately with the first of these: Ey |x ET U12 = Var(y0|x0) = ς2. It remains to deal with the
term ET (yˆ0 — ET ŷ 0 ) 2 = VarT ( ŷ 0 ) in Equation 2.27. Since the bias U3 = 0, we know that
0
If m is the 1 1-matrix with entry µ, then mmT is the 1 1-matrix with enty µ2. It follows
from Equation 5 that the variance term in which we are interested is equal to
T T −1 T T T −1
0
Since ET = EX EY|X , and the expectation of εεT is ς2IN , this is equal to
−1
ς2x0T ET (XT X)−1 x 0 = ς2xT0E X T X/N x 0/N. (8)
We suppose, as stated by the authors, that the mean of the distribution giving rise to X
and x0 is zero. For large N, X T X/N is then approximately equal to Cov(X) = Cov(x0),
the p × p-matrix-variance-covariance matrix relating the p components of a typical sample
vector x—as far as EX is concerned, this is a constant. Applying Ex0 to Equation 8 as in
Equation 2.28, we obtain (approximately)
−1 −1
ς2Ex T
0
2
0
T
0
−1
= ς2Ex T
0
= ς2trace Cov(X)−1Cov(x0) /N (9)
= ς2trace(Ip)/N
= ς2p/N.
3
, This completes our discussion of Equations 2.27 and 2.28.
Notes on Local Methods in High Dimensions
The most common error metric used to compare different predictions of the true (but un-
known) mapping function value f (x0) is the mean square error (MSE). The unknown in the
above discussion is the specific function mapping function f (· ) which can be obtained via
different methods many of which are discussed in the book. In supervised learning to help
with the construction of an appropriate prediction yˆ0 we have access to a set of “training
samples” that contains the notion of randomness in that these points are not under complete
control of the experimenter. One could ask the question as to how much square error at our
predicted input point x0 will have on average when we consider all possible training sets .
We can compute, by inserting the “expected value of the predictor obtained over all training
sets”, ET ( ŷ 0 ) into the definition of quadratic (MSE) error as
MSE(x0) = ET [f (x0) — yˆ0 ] 2
= ET [yˆ0 — ET (ŷ 0 ) + ET (ŷ 0 ) — f (x0)] 2
= ET [(yˆ0 — ET (ŷ 0 )) + 2(yˆ0 — ET (ŷ0 ))(ET (ŷ 0 ) — f (x0)) + (ET (ŷ 0 ) — f (x0)) ]
= ET [(yˆ0 — ET (ŷ 0 )) ] + (ET (ŷ 0 ) — f (x0)) .
on Real-World ApplicationFuture business exams may shift towards more experiential learning formats, such as simulations, interactive case studies, or project-based assessments. These
methods can better replicate the conditions of the modern business environment, where decision-making often requires collaboration, creativity, and adaptability. Digital tools and platforms
that offer real-time feedback and interactive elements may become more common in business exams.7.2. Law Exams: Integrating Technology and Legal PracticeFor law exams, the use of
technology is increasingly being integrated into the assessment process. Online exams, digital research tools, and case management software could be used to simulate the real-world legal
environment. The ability to use legal research tools or reference materials during the exam may become a norm, mimicking how legal professionals work. Additionally, law schools are
incorporating experiential learning opportunities, like moot courts and internships, into the curriculum, which may be reflected in future assessment methods.7.3. Mathematics Exams:
Incorporating Technology and ApplicationIn mathematics, the integration of technology is likely to continue to evolve. The use of computational tools like graphing calculators or mathematical
software can assist students in solving complex problems. Future mathematics exams may move beyond just pen-and-paper methods to include digital tools that allow for deeper exploration
of mathematical concepts. Additionally, the focus may shift towards more application-based questions, where students must use mathematics to solve real-world problems, such as those
encountered in data science or machine learning.ConclusionExams in business, law, and mathematics each serve unique purposes, testing a wide range of skills necessary for success in their
respective fields. While business exams emphasize the application of knowledge to real-world scenarios, law exams focus on legal analysis and reasoning, and mathematics exams challenge
students’ problem-solving and conceptual understanding. As these fields evolve, so too will their examination formats, with greater emphasis on technology and practical skills. Despite the
Where we have expanded the quadratic, distributed the expectation across all terms, and
noted that the middle term vanishes since it is equal to
ET [2(yˆ0 — ET (ŷ0 ))(ET (ŷ 0 ) — f (x0))] = 0 ,
because ET ( ŷ 0 ) — ET (ŷ 0 ) = 0. and we are left with
MSE(x0) = ET [(yˆ0 — ET (ŷ 0 ))2 ] + (ET (ŷ 0 ) — f (x0))2 . (10)
The first term in the above expression ET [(yˆ0 — ET (ŷ 0 )) 2 ] is the variance of our estimator yˆ0
and the second term (ET (ŷ 0 ) — f (x0)) 2 is the bias (squared) of our estimator. This notion
of variance and bias with respect to our estimate yˆ0 is to be understood relative to possible
training sets, T , and the specific computational method used in computing the estimate yˆ0
given that training set.
Exercise Solutions
Ex. 2.1 (target coding)
4