Publication | Closed Access
Automatic subspace clustering of high dimensional data for data mining applications
2.4K
Citations
42
References
1998
Year
Unknown Venue
Cluster ComputingCluster DescriptionsEngineeringMachine LearningCanonical Data DistributionAutomatic Subspace ClusteringUnsupervised Machine LearningText MiningOptimization-based Data MiningData ScienceData MiningPattern RecognitionStatisticsDocument ClusteringHigh Dimensional DataKnowledge DiscoveryComputer ScienceDimensionality ReductionHigh-dimensional MethodStructure DiscoveryData Mining ApplicationsBig Data
Data mining requires clustering algorithms that can find subspace clusters in high‑dimensional data, scale efficiently, produce comprehensible results, avoid distribution assumptions, and be order‑insensitive. The authors introduce CLIQUE, a clustering algorithm designed to meet these requirements. CLIQUE locates dense clusters in maximal‑dimensional subspaces, represents them as minimized DNF expressions for interpretability, and yields consistent results without assuming any data distribution or input order. Experiments demonstrate that CLIQUE efficiently discovers accurate clusters in large high‑dimensional datasets.
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.
| Year | Citations | |
|---|---|---|
Page 1
Page 1