Publication | Closed Access
Stork data scheduler: mitigating the data bottleneck in e-Science
29
Citations
27
References
2011
Year
Cluster ComputingEngineeringData ScienceData AccessCloud ComputingData-intensive PlatformStork Data SchedulerData IntegrationParallel ProgrammingComputer ScienceMassive Data ProcessingParallel ComputingHigh-throughput ComputingData ManagementData BottleneckData-intensive ComputingBig DataHigh-performance Data Analytics
In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1