Data Summarization Using Clustering and Classification: Spectral Clustering Combined with k-Means Using NFPH

Abstract

Clustering has been very helpful in knowledge discovery. Data miners are focused in creating quality clusters with reduced time complexity to get the most significant information. This paper aims to analyse existing techniques used in data mining for clustering and find ways to maximize accuracy of clustering. The purpose of our paper is to improve an existing clustering algorithm. This paper will introduce a novel algorithm by combining Spectral clustering with k-means with NFPH. The proposed system replaces the initialization method for cluster centroids in classical k-means algorithms which should solve some of the limitations of the k-means algorithm. We aim to select the most appropriate first centroid rather than selecting randomly. Test data sets from the medical domain which are available for research purposes will be used to train the model and an open source data mining application called WEKA is used for testing. From tests carried out on 10 different UCI data sets using the proposed solution, we found that the clustering error was reduced up to 2 percent while the processing time increased from 4~5 seconds. The increase in processing time is caused by the replacement of the initialization method of k-means. The proposed system reduced the clustering error of the spectral clustering algorithm. This system improved levels of accuracy but the processing time increased to 4 seconds.

References

Page 1

	Year	Citations

Page 1