Clustering – K-means, Nearest Neighbor and Hierarchical.
Exercise 1. K-means clustering
Use the k-means algorithm and Euclidean distance to cluster the following 8 examples into 3 clusters:
A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).
The distance matrix based on the Euclidean distance is given below:
A1 A2 A3 A4 A5 A6 A7 A8
A1 0 25 36 13 50 52 65 5
A2 0 37 18 25 17 10 20
A3 0 25 2 2 53 41
A4 0 13 17 52 2
A5 0 2 45 25
A6 0 29 29
A7 0 58
A8 0
Suppose that the initial seeds (centers of each cluster) are A1, A4 and A7. Run the k-means algorithm for
1 epoch only. At the end of this epoch show:
a) The new clusters (i.e. the examples belonging to each cluster)
b) The centers of the new clusters
c) Draw a 10 by 10 space with all the 8 points and show the clusters after the first epoch and the new
centroids.
d) How many more iterations are needed to converge? Draw the result for each epoch.
Solution:
a)
d(a,b) denotes the Eucledian distance between a and b. It is obtained directly from the distance matrix or
calculated as follows: d(a,b)=sqrt((xb-xa)2+(yb-ya)2))
seed1=A1=(2,10), seed2=A4=(5,8), seed3=A7=(1,2)
epoch1 – start:
A1: A2:
d(A1, seed1)=0 as A1 is seed1 d(A2,seed1)= 25 = 5
d(A1, seed2)= 13 >0 d(A2, seed2)= 18 = 4.24
d(A1, seed3)= 65 >0 d(A2, seed3)= 10 = 3.16 Í smaller
ÎA1 ∈ cluster1 Î A2 ∈ cluster3
A3: A4:
d(A3, seed1)= 36 = 6 d(A4, seed1)= 13
d(A3, seed2)= 25 = 5 Í smaller d(A4, seed2)=0 as A4 is seed2
d(A3, seed3)= 53 = 7.28 d(A4, seed3)= 52 >0
Î A3 ∈ cluster2 Î A4 ∈ cluster2
A5: A6:
d(A5, seed1)= 50 = 7.07 d(A6, seed1)= 52 = 7.21
, d(A5, seed2)= 13 = 3.60 Í smaller d(A6, seed2)= 17 = 4.12 Í smaller
d(A5, seed3)= 45 = 6.70 d(A6, seed3)= 29 = 5.38
Î A5 ∈ cluster2 Î A6 ∈ cluster2
A7: A8:
d(A7, seed1)= 65 >0 d(A8, seed1)= 5
d(A7, seed2)= 52 >0 d(A8, seed2)= 2 Í smaller
d(A7, seed3)=0 as A7 is seed3 d(A8, seed3)= 58
Î A7 ∈ cluster3 Î A8 ∈ cluster2
end of epoch1
new clusters: 1: {A1}, 2: {A3, A4, A5, A6, A8}, 3: {A2, A7}
b) centers of the new clusters:
C1= (2, 10), C2= ((8+5+7+6+4)/5, (4+8+5+4+9)/5) = (6, 6), C3= ((2+1)/2, (5+2)/2) = (1.5, 3.5)
c)
A A1
10 10
1
A8 A8
9 9
A A4
8 4 8
7 7
6 6
A2 A5 A A5
5 5 2
A6 A3 A6 A3
4 4
3 3
A7 A7
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
A1 A1
10 10 x
A8 A8
9 9
A4 A4
8 8
7 7
6 6 x
A A5 A2 A5
5 5
2 A A3
A6 A6
4 3 4
x
3 3
A7 A7
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10