Publication | Closed Access
Fault resilience of the algebraic multi-grid solver
75
Citations
22
References
2012
Year
Unknown Venue
Mathematical ProgrammingFault VulnerabilityEngineeringComputer ArchitectureSoftware AnalysisGrid NetworkHardware SecurityReliability EngineeringFault ResilienceFault AnalysisSystems EngineeringFault RecoveryGrid SystemHpc SystemParallel ComputingHybrid Hpc WorkloadComputer EngineeringComputer ScienceKey Hpc AlgorithmsProgram AnalysisSoftware TestingParallel ProgrammingFault AttackFault Injection
As HPC system sizes grow to millions of cores and chip feature sizes continue to decrease, HPC applications become increasingly exposed to transient hardware faults. These faults can cause aborts and performance degradation. Most importantly, they can corrupt results. Thus, we must evaluate the fault vulnerability of key HPC algorithms to develop cost-effective techniques to improve application resilience.
| Year | Citations | |
|---|---|---|
Page 1
Page 1