A cluster ensembles framework

Abstract

Ensemble methods create solutions to learning problems by constructing a set of individual (different) solutions, and subsequently suitably aggregating these, e.g., by weighted averaging of the predictions in regression, or by taking a weighted vote on the predictions in classification. Such methods, which include Bayesian model averaging, bagging and boosting, have already become very popular for supervised learning problems. For clustering, using ensembles can help to improve the quality and robustness of the results, to re-use existing knowledge, and to deal with data-distributed situations where not all objects or features are simultaneously available for computations. Aggregation strategies can be based on the idea of minimizing average dissimilarity. If only the individual cluster memberships are used, this leads to an optimization problem which in general is computationally hard. For a specific similarity measure which in the crisp case uses overall discordance (modulo relabeling), the characterization of the optimal solution allows the construction of a greedy forward aggregation algorithm (voting) which performs well on a number of clustering problems. Alternative aggregation strategies can be based on re-clustering the objects according to the rate of co-labeling, or by clustering the collection of memberships of all objects grouped according to the labels. We conclude with an outlook on possible further research on cluster ensembles.

References

Page 1

	Year	Citations

Page 1