Exam (elaborations) TEST BANK FOR Pattern Classification 2nd Edition By David G. Stork (Solution Manual)
1 Introduction 5 2 Bayesian decision theory 7 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3 Maximum likelihood and Bayesian parameter estimation 77 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4 Nonparametric techniques 131 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5 Linear discriminant functions 177 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 6 Multilayer neural networks 219 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 7 Stochastic methods 255 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 8 Nonmetric methods 277 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 9 Algorithm-independent machine learning 295 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 10 Unsupervised learning and clustering 305 Problem Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Computer Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Sample final exams and solutions 357 3 4 CONTENTS Worked examples 415 Errata and ammendations in the text 417 First and second printings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Fifth printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Chapter 1 Introduction Problem Solutions There are neither problems nor computer exercises in Chapter 1. 5 Chapter 2 Bayesian decision theory Problem Solutions Section 2.1 1. Equation 7 in the text states P(error|x) = min[P(ω1|x), P(ω2|x)]. (a) We assume, without loss of generality, that for a given particular x we have P(ω2|x) ≥ P(ω1|x), and thus P(error|x) = P(ω1|x). We have, moreover, the normalization condition P(ω1|x) = 1−P(ω2|x). Together these imply P(ω2|x) > 1/2 or 2P(ω2|x) > 1 and 2P(ω2|x)P(ω1|x) > P(ω1|x) = P(error|x). This is true at every x, and hence the integrals obey 2P(ω2|x)P(ω1|x)dx ≥ P(error|x)dx. In short, 2P(ω2|x)P(ω1|x) provides an upper bound for P(error|x). (b) From part (a), we have that P(ω2|x) > 1/2, but in the current conditions not greater than 1/α for α < 2. Take as an example, α = 4/3 and P(ω1|x) = 0.4 and hence P(ω2|x) = 0.6. In this case, P(error|x) = 0.4. Moreover, we have αP(ω1|x)P(ω2|x) = 4/3 × 0.6 × 0.4 < P(error|x). This does not provide an upper bound for all values of P(ω1|x). (c) Let P(error|x) = P(ω1|x). In that case, for all x we have P(ω2|x)P(ω1|x) < P(ω1|x)P(error|x) P(ω2|x)P(ω1|x)dx < P(ω1|x)P(error|x)dx, and we have a lower bound. 7 8 CHAPTER 2. BAYESIAN DECISION THEORY (d) The solution to part (b) also applies here. Section 2.2 2. We are given that the density is of the form p(x|ωi) = ke−|x−ai|/bi . (a) We seek k so that the function is normalized, as required by a true density. We integrate this function, set it to 1.0, k ⎡ ⎣ ai −∞ exp[(x − ai)/bi]dx + ∞ ai exp[−(x − ai)/bi]dx ⎤ ⎦ = 1, which yields 2bik = 1 or k = 1/(2bi). Note that the normalization is independent of ai, which corresponds to a shift along the axis and is hence indeed irrelevant to normalization. The distribution is therefore written p(x|ωi) = 1 2bi e −|x−ai|/bi . (b) The likelihood ratio can be written directly: p(x|ω1) p(x|ω2) = b2 b1 exp − |x − a1| b1 + |x − a2| b2 . (c) For the case a1 = 0, a2 = 1, b1 = 1 and b2 = 2, we have the likelihood ratio is p(x|ω2) p(x|ω1) = ⎧⎨ ⎩ 2e(x+1)/2 x ≤ 0 2e(1−3x)/2 0 < x ≤ 1 2e(−x−1)/2 x > 1, as shown in the figure. -2 -1 1 2 0.5 1 1.5 2 2.5 3 3.5 4 0 x p(x|ω1) p(x|ω2) Section 2.3 3. We are are to use the standard zero-one classification cost, that is λ11 = λ22 = 0 and λ12 = λ21 = 1. PROBLEM SOLUTIONS 9 (a) We have the priors P(ω1) and P(ω2) = 1 − P(ω1). The Bayes risk is given by Eqs. 12 and 13 in the text: R(P(ω1)) = P(ω1) R2 p(x|ω1)dx + (1 − P(ω1)) R1 p(x|ω2)dx. To obtain the prior with the minimum risk, we take the derivative with respect to P(ω1) and set it to 0, that is d dP(ω1)R(P(ω1)) = R2 p(x|ω1)dx − R1 p(x|ω2)dx = 0, which gives the desired result: R2 p(x|ω1)dx = R1 p(x|ω2)dx. (b) This solution is not always unique, as shown in this simple counterexample. Let P(ω1) = P(ω2) = 0.5 and p(x|ω1) = 1 −0.5 ≤ x ≤ 0.5 0 otherwise p(x|ω2) = 1 0≤ x ≤ 1 0 otherwise. It is easy to verify that the decision regions R1 = [−0.5, 0.25] and R1 = [0, 0.5] satisfy the equations in part (a); thus the solution is not unique. 4. Consider the minimax criterion for a two-category classification problem. (a) The total risk is the integral over the two regions Ri of the posteriors times their costs: R = R1 [λ11P(ω1)p(x|ω1) + λ12P(ω2)p(x|ω2)] dx + R2 [λ21P(ω1)p(x|ω1) + λ22P(ω2)p(x|ω2)] dx. We use R2 p(x|ω2) dx = 1− R1 p(x|ω2) dx and P(ω2) = 1 − P(ω1), regroup to find: R = λ22 + λ12 R1 p(x|ω2) dx − λ22 R1 p(x|ω2) dx + P(ω1) (λ11 − λ22) + λ11 R2 p(x|ω1) dx − λ12 R1 p(x|ω2) dx + λ21 R2 p(x|ω1) dx + λ22 R1 p(x|ω2) dx 10 CHAPTER 2. BAYESIAN DECISION THEORY = λ22 + (λ12 − λ22) R1 p(x|ω2) dx +P(ω1) (λ11 − λ22) + (λ11 + λ21) R2 p(x|ω1) dx + (λ22 − λ12) R1 p(x|ω2) dx . (b) Consider an arbitrary prior 0 < P∗(ω1) < 1, and assume the decision boundary has been set so as to achieve the minimal (Bayes) error for that prior. If one holds the same decision boundary, but changes the prior probabilities (i.e., P(ω1) in the figure), then the error changes linearly, as given by the formula in part (a). The true Bayes error, however, must be less than or equal to that (linearly bounded) value, since one has the freedom to change the decision boundary at each value of P(ω1). Moreover, we note that the Bayes error is 0 at P(ω1) = 0 and at P(ω1) = 1, since the Bayes decision rule under those conditions is to always decide ω2 or ω1, respectively, and this gives zero error. Thus the curve of Bayes error rate is concave down for all prior probabilities. P*(ω1) 0.5 1 P(ω1) 0.1 0.2 E(P(ω1)) (c) According to the general minimax equation in part (a), for our case (i.e., λ11 = λ22 = 0 and λ12 = λ21 = 1) the decision boundary is chosen to satisfy R2 p(x|ω1) dx = R1 p(x|ω2) dx. We assume that a single decision point suffices, and thus we seek to find x∗ such that x ∗ −∞ N(μ1, σ2 1) dx = ∞ x∗ N(μ2, σ2 2) dx, where, as usual, N(μi, σ2 i ) denotes a Gaussian. We assume for definiteness and without loss of generality that μ2 > μ1, and that the single decision point lies between the means. Recall the definition of an error function, given by Eq. 96 PROBLEM SOLUTIONS 11 in the Appendix of the text, that is, erf(x) = 2 √ π
Escuela, estudio y materia
Información del documento
- Subido en
- 16 de noviembre de 2021
- Número de páginas
- 443
- Escrito en
- 2021/2022
- Tipo
- Examen
- Contiene
- Desconocido
Temas
-
exam elaborations
-
test bank for pattern classification 2nd edition by david g stork solution manual