Publication | Closed Access
Multi-faceted microarchitecture level reliability characterization for NVIDIA and AMD GPUs
19
Citations
18
References
2018
Year
Unknown Venue
Consolidated WorkflowExtreme ThroughputEngineeringGpu BenchmarkingComputer ArchitectureSystem ReliabilityGpu ComputingHardware SecurityReliability EngineeringAmd GpusParallel ComputingReliabilityHardware ReliabilityComputer EngineeringGpu ChipsComputer ScienceGpu ClusterGpu ArchitectureCircuit ReliabilityGpu Virtualization
State-of-the-art GPU chips are designed to deliver extreme throughput for graphics as well as for data-parallel general purpose computing workloads (GPGPU computing). Unlike computing for graphics, GPGPU computing requires highly reliable operations. Since provisioning for high reliability may affect performance, the design of GPGPU systems requires the vulnerability of GPU workloads to soft-errors to be jointly evaluated with the performance of GPU chips. We present an extended study based on a consolidated workflow for the evaluation of the reliability in correlation with the performance of four GPU architectures and corresponding chips: AMD Southern Islands and NVIDIA G80/GT200/Fermi. We obtained reliability measurements (AVF and FIT) employing both fault injection and ACE-analysis based on microarchitecture-level simulators. Apart from the reliability-only and performance-only measurements, we propose combined metrics for performance and reliability that assist comparisons for the same application among GPU chips of different ISAs and vendors, as well as among benchmarks on the same GPU chip.
| Year | Citations | |
|---|---|---|
Page 1
Page 1