Data Science 11 - Clustering algorithms
Data Science 11 - Clustering algorithms k-Means and variants; Initialization: • Randomly chooses k points from X used as the initial means • k-Means++: Pick initial means, such that they are uniformly distributed in the space. This leads to faster convergence k-Means and variants; Representatives: • k-Medoids or Partitioning Around Medoids (PAM): The cluster representatives are medoids (objects from X). Only the distance between objects is needed Problems with k-Means: • Clustering model with Gaussian distribution does not always fit CURE algorithm • Assumes a Euclidean distance • Allows clusters to have any shape • Uses a collection of representative points to represent clusters CURE algorithm; Pass 1: Pick a random sample of points that fit in main memory • Initial clusters: - Cluster these points hierarchically to create initial clusters • Pick representative points: - For each cluster, p
Written for
- Institution
- Data Science 11 - Clustering algorithms k-Means an
- Course
- Data Science 11 - Clustering algorithms k-Means an
Document information
- Uploaded on
- March 20, 2024
- Number of pages
- 5
- Written in
- 2023/2024
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
data science 11 clustering algorithms k means an
Also available in package deal