Publication | Closed Access
Mining complex models from arbitrarily large databases in constant time
66
Citations
13
References
2002
Year
Unknown Venue
Incremental LearningEngineeringMachine LearningPattern DiscoveryPattern MiningInformation RetrievalData ScienceData MiningManagementDiscrete SearchData Intensive ModelingData ModelingOnline AlgorithmKnowledge DiscoveryScaling-up MethodComputer ScienceBig Data SearchDatabase TheoryInduction AlgorithmQuery OptimizationData Stream MiningComplex ModelsStructure MiningBig Data
In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm is that its running time becomes independent of the size of the database, while the decisions made are essentially identical to those that would be made given infinite data. The method works within pre-specified memory limits and, as long as the data is iid, only requires accessing it sequentially. It gives anytime results, and can be used to produce batch, stream, time-changing and active-learning versions of an algorithm. We apply the method to learning Bayesian networks, developing an algorithm that is faster than previous ones by orders of magnitude, while achieving essentially the same predictive performance. We observe these gains on a series of large databases "generated from benchmark networks, on the KDD Cup 2000 e-commerce data, and on a Web log containing 100 million requests.
| Year | Citations | |
|---|---|---|
Page 1
Page 1