Publication | Closed Access
FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems
120
Citations
132
References
2014
Year
Unknown Venue
Distributed File SystemCluster ComputingEngineeringComputer ArchitectureParallel StorageData ScienceData-intensive PlatformComputing SystemsData IntegrationParallel ComputingBig DataParallel File SystemData ManagementHigh-throughput ComputingHigh-performance Data AnalyticsDistributed Storage LayerFile SystemsComputer ScienceData-intensive ComputingScalable ComputingStorage ResourcesEdge ComputingCloud ComputingParallel ProgrammingData-intensive Scientific ApplicationsFile SystemSystem SoftwareExtreme-scale High-performance
State-of-the-art, yet decades-old, architecture of high-performance computing systems has its compute and storage resources separated. It thus is limited for modern data-intensive scientific applications because every I/O needs to be transferred via the network between the compute and storage resources. In this paper we propose an architecture that hss a distributed storage layer local to the compute nodes. This layer is responsible for most of the I/O operations and saves extreme amounts of data movement between compute and storage resources. We have designed and implemented a system prototype of this architecture - which we call the FusionFS distributed file system - to support metadata-intensive and write-intensive operations, both of which are critical to the I/O performance of scientific applications. FusionFS has been deployed and evaluated on up to 16K compute nodes of an IBM Blue Gene/P supercomputer, showing more than an order of magnitude performance improvement over other popular file systems such as GPFS, PVFS, and HDFS.
| Year | Citations | |
|---|---|---|
Page 1
Page 1