Publication | Open Access
Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing
15
Citations
4
References
2020
Year
Unknown Venue
Cluster ComputingEngineeringData ScienceBusiness IntelligenceData WarehouseCloud ComputingManagementResource SkewData IntegrationParallel ProgrammingComputer ScienceNovel Data WarehouseData ManagementData WarehousingData-intensive ComputingMassive Data ProcessingBig DataHigh-performance Data Analytics
As the primary approach to deriving decision-support insights, automated recurring routine analytic jobs account for a major part of cluster resource usages in modern enterprise data warehouses. These recurring routine jobs usually have stringent schedule and deadline determined by external business logic, and thus cause dreadful resource skew and severe resource over-provision in the cluster. In this paper, we present Grosbeak, a novel data warehouse that supports resource-aware incremental computing to process recurring routine jobs, smooths the resource skew, and optimizes the resource usage. Unlike batch processing in traditional data warehouses, Grosbeak leverages the fact that data is continuously ingested. It breaks an analysis job into small batches that incrementally process the progressively available data, and schedules these small-batch jobs intelligently when the cluster has free resources. In this demonstration, we showcase Grosbeak using real-world analysis pipelines. Users can interact with the data warehouse by registering recurring queries and observing the incremental scheduling behavior and smoothed resource usage pattern.
| Year | Citations | |
|---|---|---|
Page 1
Page 1