Publication | Closed Access
Characterizing the Impact of Intermittent Hardware Faults on Programs
52
Citations
39
References
2014
Year
Software MaintenanceEngineeringVerificationComputer ArchitectureSoftware EngineeringSoftware AnalysisHardware SecurityReliability EngineeringFault AnalysisSystems EngineeringFailure DetectionIntermittent Hardware FaultsHardware ReliabilityComputer EngineeringExtreme Complimentary Metal-oxide-semiconductorComputer ScienceFault-tolerance TechniquesProgram AnalysisTechnology ScalingSoftware TestingFault AttackFault InjectionSystem Software
Extreme complimentary metal-oxide-semiconductor (CMOS) technology scaling is causing significant concerns in the reliability of computer systems. Intermittent hardware errors are non-deterministic bursts of errors that occur in the same physical location. Recent studies have found that 40% of the processor failures in real-world machines are due to intermittent hardware errors. A study of the effects of intermittent faults on programs is a critical step in building fault-tolerance techniques of reasonable accuracy and cost. In this work, we characterize the impact of intermittent hardware faults in programs using fault-injection campaigns in a microarchitectural processor simulator. We find that 80% of the non-benign intermittent hardware errors activate a hardware trap in the processor, and the remaining 20% cause silent data corruptions. We have also investigated the possibility of using the program state at failure time in software-based diagnosis techniques, and found that much of the erroneous data are intact and can be used to identify the source of the error.
| Year | Citations | |
|---|---|---|
Page 1
Page 1