Concepedia

Abstract

We have witnessed a surge in both the big data applications being hosted by an assortment of cloud vendors, and in the astronomical amount of data they produce and consume on a daily basis. Traditional cluster computing frameworks can hardly cope with the unprecedented data volume and the geo-distributed, cross-cloud data distribution due to their limited scalability and adaptability across the heterogeneous clouds. Moreover, running data-intensive applications across clouds at will is extremely cost-inefficient and likely to incur outrageous expenses. Hence, we introduce our cloud-agnostic system PIVOT with the novel cost-aware scheduling algorithm, which enables data-intensive applications to run and scale across clouds instantly in a cost-efficient manner. We evaluate our system and scheduling algorithm extensively with the Alibaba production cluster trace, as well as real-world big data applications on a 100-node deployment across 11 regions (31 availability zones) on AWS and GCP. The experimental results show that PIVOT achieves up to 90.8% saving in expense for VM subscription and 99.2% for egress network traffic compared to the state-of-the-art baselines. Notably, the cost-aware scheduling also achieves over 4x speedup in data transfers for data-intensive applications.

References

YearCitations

Page 1