Publication | Open Access
Experimental and analytical study of Xeon Phi reliability
49
Citations
38
References
2017
Year
Unknown Venue
EngineeringMeasurementComputer ArchitectureEducationSoftware AnalysisHardware SecurityReliability EngineeringFault AnalysisError RateInstrumentationParallel ComputingReliabilityHardware ReliabilityXeon PhiComputer EngineeringComputer ScienceDevice ReliabilityHpc ApplicationsXeon Phi ReliabilityProgram AnalysisSoftware TestingCircuit ReliabilityFault AttackFault InjectionElectrical Insulation
We present an in-depth analysis of transient faults effects on HPC applications in Intel Xeon Phi processors based on radiation experiments and high-level fault injection. Besides measuring the realistic error rates of Xeon Phi, we quantify Silent Data Corruption (SDCs) by correlating the distribution of corrupted elements in the output to the application's characteristics. We evaluate the benefits of imprecise computing for reducing the programs' error rate. For example, for HotSpot a 0.5% tolerance in the output value reduces the error rate by 85%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1