Publication | Closed Access
Fast distributed k -center clustering with outliers on massive data
57
Citations
26
References
2015
Year
Cluster ComputingEngineeringDistributed AlgorithmsComputation MethodsNetwork AnalysisMap-reduceLarge DataDistributed Data AnalyticsCluster TechnologyData ScienceData MiningParallel ComputingDistributed Computing FrameworkDocument ClusteringOutlier DetectionKnowledge DiscoveryComputer ScienceDistributed ProcessingCloud ComputingParallel ProgrammingMassive DataMassive Data ProcessingBig Data
Clustering large data is a fundamental problem with a vast number of applications. Due to the increasing size of data, practitioners interested in clustering have turned to distributed computation methods. In this work, we consider the widely used k-center clustering problem and its variant used to handle noisy data, k-center with outliers. In the noise-free setting we demonstrate how a previously-proposed distributed method is actually an O(1)-approximation algorithm, which accurately explains its strong empirical performance. Additionally, in the noisy setting, we develop a novel distributed algorithm that is also an O(1)-approximation. These algorithms are highly parallel and lend themselves to virtually any distributed computing framework. We compare each empirically against the best known sequential clustering methods and show that both distributed algorithms are consistently close to their sequential versions. The algorithms are all one can hope for in distributed settings: they are fast, memory efficient and they match their sequential counterparts.
| Year | Citations | |
|---|---|---|
Page 1
Page 1