Concepedia

Publication | Closed Access

Application of Genetic Algorithm in Document Clustering

17

Citations

9

References

2009

Year

Abstract

By researching all kinds of methods for document clustering, we put forward a new dynamic method based on genetic algorithm (GA). K-means is a greedy algorithm, which is sensitive to the choice of cluster center and very easily results in local optimization. Genetic algorithm is a global convergence algorithm, which can find the best cluster centers easily. Among the traditional document clustering methods, the document similar matrix is a sparse matrix. In this paper, we propose some new formulas improved on the traditional method. Then, we make some improvement on genetic algorithm. All individuals are encoded by floating-point number and the sum of mean square deviation of intra-class distance is adopted as the objective function. The steps of the algorithm are given in detail. The experimental results show that the accuracy of GA can reach over 98 percent and generate better clustering result than K-means.

References

YearCitations

Page 1