Publication | Closed Access
Fast data in the era of big data
91
Citations
48
References
2013
Year
Unknown Venue
Cluster ComputingEngineeringMap-reduceData Streaming ArchitectureReal-time DatabasesInformation RetrievalData ScienceData MiningData-intensive PlatformManagementData IntegrationData ManagementKnowledge DiscoveryComputer ScienceBig Data SearchData Stream ManagementCorrection ServiceLatency RequirementsCloud ComputingMassive Data ProcessingBig Data
Twitter’s real‑time query suggestion and spelling‑correction must deliver relevant results within minutes after breaking news events, a challenge distinct from traditional web search. The study presents the architecture of Twitter’s real‑time query suggestion and spelling‑correction service and illustrates the challenges of real‑time data processing in the era of big data. The system was first built on a Hadoop stack that failed to meet low‑latency needs, then replaced by a custom in‑memory engine designed for real‑time processing. The authors conclude that Hadoop is unsuitable for low‑latency real‑time analytics and suggest future platforms must support both big and fast data.
We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production today, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1