Publication | Closed Access
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
171
Citations
0
References
2001
Year
Unknown Venue
We propose to scale learning algorithms to arbitrarily large databases by the following method. First derive an upper bound for the learner's loss as a function of the number of examples used in each step of the algorithm. Then use this to minimize each step's number of examples, while guaranteeing that the model produced does not differ significantly from the one that would be obtained with in nite data. We apply the method to K-means clustering, and empirically observe its speedup relative to the standard version on large databases.