Publication | Closed Access
Scalable Distributed Stream Join Processing
87
Citations
35
References
2015
Year
Unknown Venue
Cluster ComputingEngineeringStreaming AlgorithmScalable StreamData Streaming ArchitectureReal-time AnalyticsData ScienceData IntegrationParallel ComputingData ManagementStream ProcessingStreaming EngineComputer ScienceData Stream ManagementEdge ComputingCloud ComputingParallel ProgrammingData StreamsBig Data
Efficient and scalable stream joins play an important role in performing real-time analytics for many cloud applications. However, like in conventional database processing, online theta-joins over data streams are computationally expensive and moreover, being memory-based processing, they impose high memory requirement on the system. In this paper, we propose a novel stream join model, called join-biclique, which organizes a large cluster as a complete bipartite graph. Join-biclique has several strengths over state-of-the-art techniques, including memory-efficiency, elasticity and scalability. These features are essential for building efficient and scalable streaming systems. Based on join-biclique, we develop a scalable distributed stream join system, BiStream, over a large-scale commodity cluster. Specifically, BiStream is designed to support efficient full-history joins, window-based joins and online data aggregation. BiStream also supports adaptive resource management to dynamically scale out and down the system according to its application workloads. We provide both theoretical cost analysis and extensive experimental evaluations to evaluate the efficiency, elasticity and scalability of BiStream.
| Year | Citations | |
|---|---|---|
Page 1
Page 1