Concepedia

Publication | Closed Access

G-SEPM

17

Citations

34

References

2021

Year

Abstract

As GPUs become ubiquitous in large-scale general purpose HPC systems (GPGPUs), ensuring the reliable execution of such systems in the presence of soft errors is increasingly essential. To provide insights into how resilient GPU programs are toward soft errors, researchers typically rely on random Fault Injection (FI) to evaluate the tolerance of programs. However, it is expensive to obtain a statistically significant resilience profile and not suitable to identify all the error-critical fault sites of GPU programs.

References

YearCitations

Page 1