Concepedia

Abstract

As HPC system sizes grow to millions of cores and chip feature sizes continue to decrease, HPC applications become increasingly exposed to transient hardware faults. These faults can cause aborts and performance degradation. Most importantly, they can corrupt results. Thus, we must evaluate the fault vulnerability of key HPC algorithms to develop cost-effective techniques to improve application resilience.

References

YearCitations

Page 1