Publication | Closed Access
Document clustering by concept factorization
308
Citations
16
References
2004
Year
Unknown Venue
Document ClusteringEngineeringInformation RetrievalData ScienceData MiningPattern RecognitionMachine LearningTopic ModelMatrix FactorizationKnowledge DiscoveryDocument ClassificationOptimization-based Data MiningComputer ScienceCluster LabelFuzzy ClusteringData PointsText MiningConcept Factorization
In this paper, we propose a new data clustering method called concept factorization that models each concept as a linear combination of the data points, and each data point as a linear combination of the concepts. With this model, the data clustering task is accomplished by computing the two sets of linear coefficients, and this linear coefficients computation is carried out by finding the non-negative solution that minimizes the reconstruction error of the data points. The cluster label of each data point can be easily derived from the obtained linear coefficients. This method differs from the method of clustering based on non-negative matrix factorization (NMF) \citeXu03 in that it can be applied to data containing negative values and the method can be implemented in the kernel space. Our experimental results show that the proposed data clustering method and its variations performs best among 11 algorithms and their variations that we have evaluated on both TDT2 and Reuters-21578 corpus. In addition to its good performance, the new method also has the merit in its easy and reliable derivation of the clustering results.
| Year | Citations | |
|---|---|---|
Page 1
Page 1