Publication | Closed Access
GPU-Qin: A methodology for evaluating the error resilience of GPGPU applications
144
Citations
21
References
2014
Year
Unknown Venue
EngineeringGpu BenchmarkingComputer ArchitectureSoftware EngineeringSoftware AnalysisGpu ComputingError ResilienceHardware SecurityReliability EngineeringCompute KernelGpgpu ApplicationsFault Injection ExperimentsParallel ComputingReliabilityComputer EngineeringComputer ScienceGpu ClusterFault-injection MethodologyGpu ArchitectureProgram AnalysisSoftware TestingParallel ProgrammingPerformance PortabilityFault Injection
While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which makes it difficult to achieve representativeness while being time-efficient. This paper makes three key contributions. First, it presents the design of a fault-injection methodology to evaluate end-to-end reliability properties of application kernels running on GPUs. Second, it introduces a fault-injection tool that uses real GPU hardware and offers a good balance between the representativeness and the efficiency of the fault injection experiments. Third, this paper characterizes the error resilience characteristics of twelve GPGPU applications.
| Year | Citations | |
|---|---|---|
Page 1
Page 1