Concepedia

Publication | Closed Access

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters

29

Citations

9

References

2013

Year

Abstract

Hadoop is the de-facto standard for big data analytics applications. Presently available schedulers for Hadoop clusters assign tasks to nodes without regard to the capability of the nodes. We propose ThroughputScheduler, which reduces the overall job completion time on a clusters of heterogeneous nodes by actively scheduling tasks on nodes based on optimally matching job requirements to node capabilities. Node capabilities are learned by running probe jobs on the cluster. ThroughputScheduler uses a Bayesian, active learning scheme to learn the resource requirements of jobs on-the-fly. An empirical evaluation on a set of sample problems demonstrates that ThroughputScheduler can reduce total job completion time by almost 20 % compared to the Hadoop FairScheduler and 40 % compared to FIFOScheduler. ThroughputScheduler also reduces average mapping time by 33 % compared to either of these schedulers. 1

References

YearCitations

Page 1