Publication | Closed Access
GUFI: A framework for GPUs reliability assessment
60
Citations
16
References
2016
Year
Unknown Venue
EngineeringGpu BenchmarkingComputer ArchitectureGpu ComputingHardware SecurityReliability EngineeringParallel ComputingReliabilityComputer EngineeringComputer ScienceReliability Assessment FindingsGpu ClusterGpus Reliability AssessmentGpu ArchitecturesGpu ArchitectureProgram AnalysisSoftware TestingMany-core ArchitectureParallel ProgrammingFault InjectionGpu Virtualization
Modern many-core Graphics Processing Units (GPUs) are extensively employed in general purpose computing (GPGPU), offering a remarkable execution speedup to inherently data parallel workloads. Unlike graphics computing, GPGPU computing has more stringent reliability requirements. Thus, accurate reliability assessment of GPU hardware structures is important for making informed decisions for error protection. In this paper we focus on microarchitecture-level reliability assessment for GPU architectures. The paper makes the following contributions. First, it presents a comprehensive fault injection framework that targets key hardware structures of GPU architectures such as the register file, the shared memory, the SIMT stack and the instruction buffer, which altogether occupy large part of a modern GPU silicon area. Second, it reports our reliability assessment findings for the target structures, when the GPU executes a diverse set of twelve GPGPU applications. Third, it discusses remarkable differences in the results of fault injection when the applications are simulated in the virtual NVIDIA GPUs instruction set (ptx) vs. the actual instruction set (sass). Finally, it discusses how the framework can be employed either by architects in the early stages of design phase or by programmers for a GPU application's error resilience enhancement.
| Year | Citations | |
|---|---|---|
Page 1
Page 1