Stork: making data placement a first class citizen in the grid

TLDR

Scientific applications generate rapidly growing data that must be accessed globally, creating a critical need for efficient and reliable data placement across wide area networks, yet existing solutions are manual or rely on simple scripts lacking automation or fault tolerance. The authors aim to elevate data placement to a first‑class citizen in the Grid, treating it like computational jobs. They developed Stork, a scheduler that queues, schedules, monitors, manages, checkpoints, and ensures successful completion of data placement jobs without human intervention, recognizing their distinct semantics and characteristics.

Abstract

Todays scientific applications have huge data requirements which continue to increase drastically every year. These data are generally accessed by many users from all across the the globe. This implies a major necessity to move huge amounts of data around wide area networks to complete the computation cycle, which brings with it the problem of efficient and reliable data placement. The current approach to solve this problem of data placement is either doing it manually, or employing simple scripts which do not have any automation or fault tolerance capabilities. Our goal is to make data placement activities first class citizens in the Grid just like the computational jobs. They will be queued, scheduled, monitored, managed, and even check-pointed. More importantly, it will be made sure that they complete successfully and without any human interaction. We also believe that data placement jobs should be treated differently from computational jobs, since they may have different semantics and different characteristics. For this purpose, we have developed Stork, a scheduler for data placement activities in the grid.

References

Page 1

	Year	Citations

Page 1