Concepedia

Publication | Closed Access

GUFI: A framework for GPUs reliability assessment

60

Citations

16

References

2016

Year

Abstract

Modern many-core Graphics Processing Units (GPUs) are extensively employed in general purpose computing (GPGPU), offering a remarkable execution speedup to inherently data parallel workloads. Unlike graphics computing, GPGPU computing has more stringent reliability requirements. Thus, accurate reliability assessment of GPU hardware structures is important for making informed decisions for error protection. In this paper we focus on microarchitecture-level reliability assessment for GPU architectures. The paper makes the following contributions. First, it presents a comprehensive fault injection framework that targets key hardware structures of GPU architectures such as the register file, the shared memory, the SIMT stack and the instruction buffer, which altogether occupy large part of a modern GPU silicon area. Second, it reports our reliability assessment findings for the target structures, when the GPU executes a diverse set of twelve GPGPU applications. Third, it discusses remarkable differences in the results of fault injection when the applications are simulated in the virtual NVIDIA GPUs instruction set (ptx) vs. the actual instruction set (sass). Finally, it discusses how the framework can be employed either by architects in the early stages of design phase or by programmers for a GPU application's error resilience enhancement.

References

YearCitations

Page 1