Feature Selection in k-Median Clustering

Abstract

An e ective method for selecting features in clustering\nunlabeled data is proposed based on changing the objective\nfunction of the standard k-median clustering algorithm. The\nchange consists of perturbing the objective function by a\nterm that drives the medians of each of the k clusters toward\nthe (shifted) global median of zero for the entire dataset.\nAs the perturbation parameter is increased, more and more\nfeatures are driven automatically toward the global zero\nmedian and are eliminated from the problem until one last\nfeature remains. An error curve for unlabeled data clustering\nas a function of the number of features used gives reducedfeature\nclustering error relative to the \\gold standard" of the\nfull-feature clustering. This clustering error curve parallels\na classi cation error curve based on real data labels. This\njusti es the utility of the former error curve for unlabeled\ndata as a means of choosing an appropriate number of\nreduced features in order to achieve a correctness comparable\nto that obtained by the full set of original features. For\nexample, on the 3-class Wine dataset, clustering with 4\nselected input space features is comparable to within 4%\nto clustering using the original 13 features of the problem.

References

Page 1

	Year	Citations

Page 1