Publication | Closed Access
Parallel boosted regression trees for web search ranking
164
Citations
26
References
2011
Year
Unknown Venue
Ranking AlgorithmMaster ProcessorMachine LearningEngineeringRegression TreeMachine Learning ToolLearning To RankText MiningInformation RetrievalData ScienceData MiningWeb Search RankingParallel ComputingSupervised LearningMachine Learning ModelPredictive AnalyticsKnowledge DiscoveryComputer ScienceDeep LearningSearch Engine DesignParallel LearningParallel ProgrammingMaster-worker ParadigmBig Data
Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned web-search ranking - a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the construction of the individual regression trees and operates using the master-worker paradigm as follows. The data are partitioned among the workers. At each iteration, the worker summarizes its data-partition using histograms. The master processor uses these to build one layer of a regression tree, and then sends this layer to the workers, allowing the workers to build histograms for the next layer. Our algorithm carefully orchestrates overlap between communication and computation to achieve good performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1