A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

Concepedia

Publication | Closed Access

171

Citations

References

2001

Year

Pedro Domingos, Geoff Hulten

Unknown Venue

Abstract

We propose to scale learning algorithms to arbitrarily large databases by the following method. First derive an upper bound for the learner&apos;s loss as a function of the number of examples used in each step of the algorithm. Then use this to minimize each step&apos;s number of examples, while guaranteeing that the model produced does not differ significantly from the one that would be obtained with in nite data. We apply the method to K-means clustering, and empirically observe its speedup relative to the standard version on large databases.