Fundamental Concepts and Algorithms 2nd Edition by Mohammed
J. Zaki, Wagner Meira Jr A+ LATEST
, lOMoARcPSD|30878495
Contents
Contents 1
1 Data Mining and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.7 Exercises 3
PART I DATA ANALYSIS FOUNDATIONS 5
2 Numeric Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Exercises 7
3 Categorical Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.7 Exercises 16
4 Graph Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Exercises 20
5 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.6 Exercises 26
6 High-dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.9 Exercises 29
7 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.6 Exercises 39
PART II FREQUENT PATTERN MINING 45
8 Itemset Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.5 Exercises 47
9 Summarizing Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.6 Exercises 56
1
Downloaded by Irfan Ali ()
, lOMoARcPSD|30878495
2 Contents
10 Sequence Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
10.5 Exercises 63
11 Graph Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.5 Exercises 75
12 Pattern and Rule Assessment . . . . . . . . . . . . . . . . . . . . . . . . 84
12.4 Exercises 84
PART III CLUSTERING 89
13 Representative-based Clustering . . . . . . . . . . . . . . . . . . . . . . 91
13.5 Exercises 91
14 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.4 Exercises 99
15 Density-based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 106
15.5 Exercises 106
16 Spectral and Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . 111
16.5 Exercises 111
17 Clustering Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
17.5 Exercises 118
PART IV CLASSIFICATION 123
18 Probabilistic Classification . . . . . . . . . . . . . . . . . . . . . . . . . 125
18.5 Exercises 125
19 Decision Tree Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
19.4 Exercises 129
20 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 137
20.4 Exercises 137
21 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 141
21.7 Exercises 141
22 Classification Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . 145
22.5 Exercises 145
This book has been published by Cambridge University Press. No unauthorized distribution shall be allowed.
Downloaded by Irfan Ali ()
, lOMoARcPSD|30878495
CHAPTER 1 Data Mining and Analysis
1.7 EXERCISES
Q1. Show that the mean of the centered data matrix Z in Eq. (1.5) is 0.
Answer: Each centered point is given as: zi = xi − µ. Their mean is therefore:
n n
1X 1X
zi = (xi − µ)
n n
i=0 i=0
n
1X 1
= xi − · n · µ
n n
i=0
= µ−µ= 0
Q2. Prove that for the Lp -distance in Eq. (1.2), we have
d
δ∞ (x, y) = lim δp (x, y) = max |xi − yi |
p→∞ i=1
for x, y ∈ Rd .
Answer: We have to show that
d
! p1
X d
lim |xi − yi |p = max |xi − yi |
p→∞ i=1
i=1
Assume that dimension a is the max, and let m = |xa − ya |. For simplicity, we
assume that |xi − yi | < m for all i 6= a.
If we divide and multiply the left hand side with mp we get:
! p1 1
d
X X |xi − yi | p
p
|xi − yi | p
m p
= m 1 +
m m
i=1 i6=a
3
Downloaded by Irfan Ali ()