Concepedia

TLDR

Facebook recently launched Facebook Messages, its first user‑facing app built on Apache Hadoop, and uses HBase—a Hadoop‑based database layer—to support billions of messages daily. The paper explains why Facebook selected Hadoop and HBase over alternatives like Cassandra and Voldemort, focusing on the app’s consistency, availability, partition tolerance, data model, and scalability needs. The authors enhanced Hadoop for real‑time use, configured tradeoffs, and designed the system to outperform sharded MySQL, while addressing operational challenges and outlining future improvements. The enhancements to Hadoop provide a more effective real‑time system that outperforms sharded MySQL, and the authors present these findings as a model for other companies considering Hadoop over traditional sharded RDBMS deployments.

Abstract

Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day. This paper describes the reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort and discusses the application's requirements for consistency, availability, partition tolerance, data model and scalability. We explore the enhancements made to Hadoop to make it a more effective realtime system, the tradeoffs we made while configuring the system, and how this solution has significant advantages over the sharded MySQL database scheme used in other applications at Facebook and many other web-scale companies. We discuss the motivations behind our design choices, the challenges that we face in day-to-day operations, and future capabilities and improvements still under development. We offer these observations on the deployment as a model for other companies who are contemplating a Hadoop-based solution over traditional sharded RDBMS deployments.

References

YearCitations

Page 1