Publication | Closed Access
Document Clustering Using K-Means, Heuristic K-Means and Fuzzy C-Means
89
Citations
10
References
2011
Year
Unknown Venue
Cluster ComputingEngineeringCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningDocument ClassificationContent AnalysisFuzzy Pattern RecognitionDocument ClusteringFuzzy LogicStop Word RemovalAutomatic ClassificationKnowledge DiscoveryFuzzy C-means AlgorithmsHeuristic K-meansKeyword ExtractionClassificationFuzzy Clustering
Document clustering refers to unsupervised classification (categorization) of documents into groups (clusters) in such a way that the documents in a cluster are similar, whereas documents in different clusters are dissimilar. The documents may be web pages, blog posts, news articles, or other text files. This paper presents our experimental work on applying K-means, heuristic K-means and fuzzy C-means algorithms for clustering text documents. We have experimented with different representations (tf, tf.idf & Boolean) and different feature selection schemes (with or without stop word removal & with or without stemming). We ran our implementations on some standard datasets and computed various performance measures for these algorithms. The results indicate that tf.idf representation, and use of stemming obtains better clustering. Moreover, fuzzy clustering produces better results than both K-means and heuristic K-means on almost all datasets, and is a more stable method.
| Year | Citations | |
|---|---|---|
Page 1
Page 1