Concepedia

Publication | Closed Access

GPFS: A Shared-Disk File System for Large Computing Clusters

1.2K

Citations

9

References

2002

Year

TLDR

GPFS is IBM’s parallel, shared‑disk file system used on many of the world’s largest supercomputers, built on distributed locking and recovery concepts whose scalability had been uncertain. This paper aims to describe GPFS and explain how its distributed locking and recovery mechanisms were extended to scale to large clusters. The authors tested these extensions on the largest systems available, evaluating the limits of GPFS’s distributed locking and recovery under extreme cluster sizes. The study found that while many existing techniques scaled adequately, several key areas required new approaches to achieve reliable performance.

Abstract

GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

References

YearCitations

Page 1