1. Hierarchical clustering
- Avoid choosing number of clusters beforehand
- Dendrograms help visualize different clustering granularities (no need to rerun algorithm)
- Most algorithm allow user to choose any distance metric (k-means restricted us to euclidean distance)
- Can often find more complex shapes than k-means or gaussian mixture model
Divisive (top-down):
start with all data in a big cluster and recursively split(recursive k-means)
- which algorithm to recurse
- how many clusters per split
- when to split vs stop, max cluster size or max cluster radius or specified number of clusters
Agglomerative (bottom-up):
start with each data point at its own cluster, merge cluster until all points are in one big cluster (single linkage)
single linkage
- initialize each point to be its own cluster
- define distance between clusters to bb the minimum distance of C1 in cluster one and C2 in clustrer two
- merge the two closest cluster
- repeat step 3 until all points are in one cluster
Dendrogram
x axis shows data points (carefully ordered).
y axis shows distance between pairs of clusters.
Path shows all cluser to which a point belongs and the order in which clusters merge.
时间: 2024-10-19 11:12:35