100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Solution Manual for Data Mining and Machine Learning: Fundamental Concepts and Algorithms 2nd Edition by Mohammed J. Zaki, Wagner Meira Jr

Rating
-
Sold
2
Pages
149
Grade
A+
Uploaded on
14-04-2025
Written in
2024/2025

Solution Manual for Data Mining and Machine Learning: Fundamental Concepts and Algorithms 2nd Edition by Mohammed J. Zaki, Wagner Meira Jr

Institution
Data Mining And Machine Learning: Fundamental Conc
Course
Data Mining and Machine Learning: Fundamental Conc

















Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Data Mining and Machine Learning: Fundamental Conc
Course
Data Mining and Machine Learning: Fundamental Conc

Document information

Uploaded on
April 14, 2025
Number of pages
149
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

  • solution manual

Content preview

,Contents




Contents 1

1 Data Mining and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.7 Exercises 3

PART I DATA ANALYSIS FOUNDATIONS 5

2 Numeric Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Exercises 7

3 Categorical Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.7 Exercises 16

4 Graph Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Exercises 20

5 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.6 Exercises 26

6 High-dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.9 Exercises 29

7 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.6 Exercises 39

PART II FREQUENT PATTERN MINING 45

8 Itemset Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.5 Exercises 47

9 Summarizing Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.6 Exercises 56

1

,2 Contents

10 Sequence Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
10.5 Exercises 63

11 Graph Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.5 Exercises 75

12 Pattern and Rule Assessment . . . . . . . . . . . . . . . . . . . . . . . . 84
12.4 Exercises 84

PART III CLUSTERING 89

13 Representative-based Clustering . . . . . . . . . . . . . . . . . . . . . . 91
13.5 Exercises 91

14 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14.4 Exercises 99

15 Density-based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 106
15.5 Exercises 106

16 Spectral and Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . 111
16.5 Exercises 111

17 Clustering Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
17.5 Exercises 118

PART IV CLASSIFICATION 123
18 Probabilistic Classification . . . . . . . . . . . . . . . . . . . . . . . . . 125
18.5 Exercises 125

19 Decision Tree Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
19.4 Exercises 129

20 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 137
20.4 Exercises 137

21 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 141
21.7 Exercises 141

22 Classification Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . 145
22.5 Exercises 145




This book has been published by Cambridge University Press. No unauthorized distribution shall be allowed.

,CHAPTER 1 Data Mining and Analysis




1.7 EXERCISES

Q1. Show that the mean of the centered data matrix Z in Eq. (1.5) is 0.

Answer: Each centered point is given as: zi = xi − µ. Their mean is therefore:
n n
1X 1X
zi = (xi − µ)
n n
i=0 i=0
n
1X 1
= xi − · n · µ
n n
i=0

= µ−µ= 0



Q2. Prove that for the Lp -distance in Eq. (1.2), we have

d 
δ∞ (x, y) = lim δp (x, y) = max |xi − yi |
p→∞ i=1

for x, y ∈ Rd .

Answer: We have to show that

d
! p1
X d 
lim |xi − yi |p = max |xi − yi |
p→∞ i=1
i=1

Assume that dimension a is the max, and let m = |xa − ya |. For simplicity, we
assume that |xi − yi | < m for all i 6= a.
If we divide and multiply the left hand side with mp we get:

! p1  1
d 
X  X  |xi − yi | p
p
|xi − yi | p
m p
= m 1 + 
m m
i=1 i6=a


3

,4 Data Mining and Analysis

 p
As p → ∞, each term |xi m−yi |
→ 0, since m > |xi − yi | for all i 6= a. The finite
P  |xi −yi | p
summation i6=a m converges to 0 as p → ∞, as does 1/p.
Thus δ∞ (x, y) = m × 10 = m = |xa − ya | = maxdi=1 {|xi − yi |}
Note that the same result is obtained even if we assume that dimensions other
than a achieve the maximum value m. In the worst case, we have m = |xi − yi | for
all d dimensions. In this case, the expression on LHS becomes

d
!1/p
X
lim m p
1 = lim md 1/p = lim md 0 = m
p→∞ p→∞ p→∞
i=1




This book has been published by Cambridge University Press. No unauthorized distribution shall be allowed.

,P A R T ONE DATA ANALYSIS
FOUNDATIONS

,
,CHAPTER 2 Numeric Attributes




2.7 EXERCISES

Q1. True or False:
(a) Mean is robust against outliers.
Answer: False
(b) Median is robust against outliers.
Answer: True
(c) Standard deviation is robust against outliers.
Answer: False

Q2. Let X and Y be two random variables, denoting age and weight, respectively.
Consider a random sample of size n = 20 from these two variables


X = (69, 74, 68, 70, 72, 67, 66, 70, 76, 68, 72, 79, 74, 67, 66, 71, 74, 75, 75, 76)
Y = (153, 175, 155, 135, 172, 150, 115, 137, 200, 130, 140, 265, 185, 112, 140,
150, 165, 185, 210, 220)


(a) Find the mean, median, and mode for X.
Answer: The mean, median, and mode are:
2
1 X
µ= 0xi = 1429/20 = 71.45
20
i=1

median = (71 + 72)/2 = 71.5
mode = 74



(b) What is the variance for Y?

7

,8 Numeric Attributes


Answer: The mean of Y is µY = 3294/20 = 164.7. The variance is:
2
1 X
σY2 = 0yi − µY = 27384.2/20 = 1369.21
20
i=1


(c) Plot the normal distribution for X.
Answer: The mean for X is µX = 71.45, and the variance is σX2 = 13.8475, with
a standard deviation of σX = 3.72.
f (x)




0.10




0.05




0 x
60 65 70 75 80
(d) What is the probability of observing an age of 80 or higher?
Answer: If we leverage the empirical probability mass function, we get:

P (X ≥ 80) = 0/20 = 0

since we do not have anyone with age 80 or more in our sample.
We can use the normal distribution modeling, with parameters µX = 71.45 and
σX2 = 3.72 to get:

Z∞
P (X ≥ 80) = N(x|µX , σX ) = 0.010769
80


b for these two
(e) Find the 2-dimensional mean µ̂ and the covariance matrix 6
variables.
Answer: The mean and covariance matrices are:

µ = (µX , µY )T = (71.45, 164.7)T
 2   
σX σXY 13.8475 122.435
6= =
σXY σY2 122.435 1369.21


(f) What is the correlation between age and weight?


This book has been published by Cambridge University Press. No unauthorized distribution shall be allowed.

, 2.7 Exercises 9


Answer:
122.435
ρXY = σXY /(σX σY ) = √ = 0.889
13.845 · 1369.21

(g) Draw a scatterplot to show the relationship between age and weight.
Answer: The scatterplot is shown in the figure below.
bC


250

230
bC

bC
210
Y: Weight




bC

190 bC bC

bC
bC
170 bC

bC bC
bC bC
150
bC bC
bCbC
bC
130
bC
bC
110
65 67 69 71 73 75 77 79
X: Age

Q3. Show that the identity in Eq. (2.15) holds, that is,

n
X n
X
(xi − µ)2 = n(µ̂ − µ)2 + (xi − µ̂)2
i=1 i=1


Answer: Consider the RHS
n
X n
X
n(µ̂ − µ)2 + (xi − µ̂)2 = n(µ̂2 − 2µ̂µ + µ2 ) + (xi2 − 2xi µ̂ + µ̂2 )
i=1 i=1
n
!
X
2 2
= nµ̂ − 2nµ̂µ + nµ + xi2 − 2nµ̂2 + nµ̂2
i=1
n
!
X
= xi2 − 2nµ̂µ + nµ2
i=1
n
!  Pn  n
X X
i=1 xi
= xi2 − 2n µ+ µ2
n
i=1 i=1
n
X
= (xi − µ)2
i=1



This book has been published by Cambridge University Press. No unauthorized distribution shall be allowed.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
AcademiContent Aalborg University
View profile
Follow You need to be logged in order to follow users or courses
Sold
3061
Member since
6 year
Number of followers
2132
Documents
1236
Last sold
23 hours ago

4.0

386 reviews

5
203
4
84
3
38
2
17
1
44

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions