Publication | Closed Access
Stateful bulk processing for incremental analytics
155
Citations
24
References
2010
Year
Unknown Venue
Cluster ComputingEngineeringFragile CodeComputer ArchitectureStateful Bulk ProcessingEntire DataflowMap-reduceData Streaming ArchitectureData ScienceData IntegrationStateful Dataflow ProgramsParallel ComputingData ManagementHigh-performance Data AnalyticsComputer EngineeringComputer ScienceData Stream ManagementData-intensive ComputingCloud ComputingParallel ProgrammingData-level ParallelismMassive Data ProcessingBig Data
This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evolving data sets. These data-intensive applications perform complex multi-step computations over successive generations of data inflows, such as weekly web crawls, daily image/video uploads, log files, and growing social networks. While programmers may simply re-run the entire dataflow when new data arrives, this is grossly inefficient, increasing result latency and squandering hardware resources and energy. Alternatively, programmers may use prior results to incrementally incorporate the changes. However, current large-scale data processing tools, such as Map-Reduce or Dryad, limit how programmers incorporate and use state in data-parallel programs. Straightforward approaches to incorporating state can result in custom, fragile code and disappointing performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1