Publication | Open Access
Parallel algorithms for hierarchical clustering
406
Citations
19
References
1995
Year
Cluster ComputingMinimum Variance MetricsDocument ClusteringEngineeringData ScienceData MiningParallel Complexity TheoryParallel ProcessingHierarchical ClusteringParallel ProgrammingComputer ScienceParallel ComputingComputational GeometryParallel MetaheuristicsOptimal Pram AlgorithmsParallel AlgorithmsCluster Technology
Hierarchical clustering is a common method for grouping similar data points in multidimensional spaces, with known O(n²) algorithms. This paper reviews sequential algorithms and describes prior and new parallel algorithms for hierarchical clustering. Parallel algorithms for various distance metrics are presented, including PRAM, butterfly, and tree implementations. Optimal PRAM, butterfly, and tree algorithms using n/log n processors achieve asymptotic speedups for average, complete, centroid, median, minimum variance, and single‑link metrics.
Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n2) algorithms are known for this problem [3,4,11,19]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms for hierarchical clustering. Parallel algorithms to perform hierarchical clustering using several distance metrics are then described. Optimal PRAM algorithms using n/log n processors are given for the average link, complete link, centroid, median, and minimum variance metrics. Optimal butterfly and tree algorithms using n/log n processors are given for the centroid, median, and minimum variance metrics. Optimal asymptotic speedups are achieved for the best practical algorithm to perform clustering using the single link metric on a n/log n processor PRAM, butterfly, or tree.
| Year | Citations | |
|---|---|---|
Page 1
Page 1