Publication | Closed Access
Computing Clusters of Correlation Connected objects
166
Citations
24
References
2004
Year
Unknown Venue
Cluster ComputingDifferent SubgroupsEngineeringMachine LearningNetwork AnalysisCorrelation Connected ObjectsUnsupervised Machine LearningText MiningLocal SubgroupsCluster TechnologyOptimization-based Data MiningData ScienceData MiningPattern RecognitionBiostatisticsPrincipal Component AnalysisComputational GeometryDocument ClusteringKnowledge DiscoveryComputer ScienceFeature ConstructionComputational ScienceStructure DiscoveryFeature Vectors
The detection of correlations between different features in a set of feature vectors is a very important data mining task because correlation indicates a dependency between the features or some association of cause and effect between them. This association can be arbitrarily complex, i.e. one or more features might be dependent from a combination of several other features. Well-known methods like the principal components analysis (PCA) can perfectly find correlations which are global, linear, not hidden in a set of noise vectors, and uniform, i.e. the same type of correlation is exhibited in all feature vectors. In many applications such as medical diagnosis, molecular biology, time sequences, or electronic commerce, however, correlations are not global since the dependency between features can be different in different subgroups of the set. In this paper, we propose a method called 4C (Computing Correlation Connected Clusters) to identify local subgroups of the data objects sharing a uniform but arbitrarily complex correlation. Our algorithm is based on a combination of PCA and density-based clustering (DBSCAN). Our method has a determinate result and is robust against noise. A broad comparative evaluation demonstrates the superior performance of 4C over competing methods such as DBSCAN, CLIQUE and ORCLUS.
| Year | Citations | |
|---|---|---|
Page 1
Page 1