Clustering With Constraints: Feasibility Issues and the <i>k</i>-Means Algorithm

Abstract

Recent work has looked at extending the k-Means algorithm to incorporate background information in the form of instance level must-link and cannot-link constraints. We introduce two ways of specifying additional background information in the form of δ and ∊ constraints that operate on all instances but which can be interpreted as conjunctions or disjunctions of instance level constraints and hence are easy to implement. We present complexity results for the feasibility of clustering under each type of constraint individually and several types together. A key finding is that determining whether there is a feasible solution satisfying all constraints is, in general, NP-complete. Thus, an iterative algorithm such as k-Means should not try to find a feasible partitioning at each iteration. This motivates our derivation of a new version of the k-Means algorithm that minimizes the constrained vector quantization error but at each iteration does not attempt to satisfy all constraints. Using standard UCI datasets, we find that using constraints improves accuracy as others have reported, but we also show that our algorithm reduces the number of iterations until convergence. Finally, we illustrate these benefits and our new constraint types on a complex real world object identification problem using the infra-red detector on an Aibo robot.

References

Page 1

	Year	Citations

Page 1