Publication | Closed Access
Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions
52
Citations
7
References
2013
Year
Unknown Venue
Distributed File SystemCluster ComputingStorage PerformanceEngineeringStorage ManagementComputer ArchitectureParallel StorageFile Size DistributionsStorage SystemsData ScienceMeasurement MethodologyParallel ComputingParallel File SystemData ManagementElectrical EngineeringFile SystemsComputer EngineeringComputer ScienceCustomer InstallationsCloud ComputingParallel ProgrammingStorage System ModelingIn-storage ComputingBig Data
We studied file size distributions from 65 customer installations and a total of nearly 600 million files. We found that between 25% and 90% of all files are 64 Kbytes or less in size, yet these files account for less than 3% of the capacity in most cases. In extreme cases 5% to 15% of capacity is occupied by small files. We used this information to size the ratio of SSD to HDD capacity on our latest HPC storage system. Our goal is to automatically allocate all of the block-level and file-level metadata, and all of the small files onto SSD, and use the much cheaper HDD storage for large file extents. The unique storage blade architecture of the Panasas system that couples SSD, HDD, processor, memory, and networking into a scalable building block makes this approach very effective. Response time measured by metadata intensive benchmarks is several times better in our systems that couple SSD and HDD. The paper describes the measurement methodology, the results from our customer survey, and the performance benefits of our approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1