Publication | Closed Access
An analysis of data corruption in the storage stack
278
Citations
24
References
2008
Year
Software MaintenanceStorage PerformanceEngineeringInformation SecurityVerificationInformation ForensicsSoftware AnalysisFormal VerificationHardware SecurityData ConsistencyReliability EngineeringData IntegrityData ScienceSame Storage SystemProduction Storage SystemsManagementData IntegrationData ManagementFile SystemComputer EngineeringComputer ScienceData SecuritySilent Data CorruptionSoftware TestingStorage SecurityStorage StackSystem SoftwareData ModelingIntegrity Verification
Silent data corruption poses a serious threat to reliable data storage, and understanding its characteristics is essential for developing protection mechanisms. This article presents the first large‑scale study of data corruption. The authors analyzed 1.53 million disk drives over 41 months, examining checksum mismatches, identity discrepancies, and parity inconsistencies, with a focus on checksum mismatches. They identified over 400 000 checksum mismatches, revealing that near‑line disks are ten times more prone, that mismatches on the same disk exhibit strong spatial and temporal locality, and that mismatches across disks in the same system are not independent.
An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand its characteristics. In this article, we present the first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. We study three classes of corruption: checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches since they occur the most. We find more than 400,000 instances of checksum mismatches over the 41-month period. We find many interesting trends among these instances, including: (i) nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise-class disk drives, (ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality, and (iii) checksum mismatches across different disks in the same storage system are not independent. We use our observations to derive lessons for corruption-proof system design.
| Year | Citations | |
|---|---|---|
Page 1
Page 1