Publication | Closed Access
Impact of GPUs Parallelism Management on Safety-Critical and HPC Applications Reliability
81
Citations
19
References
2014
Year
Unknown Venue
EngineeringGpu BenchmarkingComputer ArchitectureGpu ComputingHpc Applications ReliabilityHardware SecurityReliability EngineeringScheduling StrainSystems EngineeringParallel ComputingHybrid Hpc WorkloadGpu Parallelism ManagementComputer EngineeringComputer ScienceGpu ClusterGpu ArchitectureProgram AnalysisGpu Cross SectionParallel ProgrammingGpus Parallelism ManagementGpu Virtualization
Graphics Processing Units (GPUs) offer high computational power but require high scheduling strain to manage parallel processes, which increases the GPU cross section. The results of extensive neutron radiation experiments performed on NVIDIA GPUs confirm this hypothesis. Reducing the application Degree Of Parallelism (DOP) reduces the scheduling strain but also modifies the GPU parallelism management, including memory latency, thread registers number, and the processors occupancy, which influence the sensitivity of the parallel application. An analysis on the overall GPU radiation sensitivity dependence on the code DOP is provided and the most reliable configuration is experimentally detected. Finally, modifying the parallel management affects the GPU cross section but also the code execution time and, thus, the exposure to radiation required to complete computation. The Mean Workload and Executions Between Failures metrics are introduced to evaluate the workload or the number of executions computed correctly by the GPU on a realistic application.
| Year | Citations | |
|---|---|---|
Page 1
Page 1