Publication | Closed Access
Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams
31
Citations
10
References
2011
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureStreaming AlgorithmData Streaming ArchitectureData ScienceWindow ProcessingData IntegrationParallel ComputingData ManagementStream ProcessingStreaming EngineComputer EngineeringComputer ScienceData Stream ManagementCloud ComputingParallel ProgrammingContinuous Split StageData StreamsBig Data
This paper proposes new techniques for e ciently parallelizing sliding window processing over data streams on a shared-nothing cluster of commodity hardware. Data streams are first partitioned on the fly via a continuous split stage that takes the query semantics into account in a way that respects the natural chunking (windowing) of the stream by the query. The split does not scale well enough when there is high degree of overlap across the windows. To remedy this problem, we propose two alternative partitioning strategies based on batching and pane-based processing, respectively. Lastly, we provide a continuous merge stage at the end that combines the results on the fly while meeting QoS requirements on ordered delivery. We implemented these techniques as part of the Borealis distributed stream processing system, and conducted experiments that show the scalability of our techniques based on the Linear Road Benchmark.
| Year | Citations | |
|---|---|---|
Page 1
Page 1