Publication | Open Access
Data sieving and collective I/O in ROMIO
495
Citations
13
References
1999
Year
Unknown Venue
Data RepresentationCluster ComputingEngineeringComputer ArchitectureParallel StorageNas Btio BenchmarkDistinct I/o RequestsData ScienceManagementData IntegrationParallel ComputingParallel File SystemData ManagementI/o Access PatternsData SievingComputer EngineeringComputer ScienceData-intensive ComputingExternal-memory AlgorithmProgram AnalysisParallel Performance EvaluationCloud ComputingParallel ProgrammingData-level ParallelismSystem SoftwareData Modeling
Parallel programs often perform many small, noncontiguous I/O accesses, which severely degrades performance. The study demonstrates how ROMIO uses MPI‑IO’s single‑call interface to eliminate this problem. ROMIO achieves high performance by combining data sieving for single‑process requests with collective I/O for multi‑process requests, and it offers a portable, memory‑controlled implementation. Performance tests on three applications across five parallel machines confirm ROMIO’s efficiency and portability.
The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications-an astrophysics-application template (DIST3D) the NAS BTIO benchmark, and an unstructured code (UNSTRUC)-on five different parallel machines: HP Exemplar IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.
| Year | Citations | |
|---|---|---|
Page 1
Page 1