Publication | Closed Access
Parallel data-locality aware stencil computations on modern micro-architectures
21
Citations
11
References
2009
Year
Unknown Venue
EngineeringGpu BenchmarkingComputer ArchitectureBiomedical EngineeringGpu ComputingHigh-performance ArchitectureStencil ComputationsModeling And SimulationParallel ComputingMassively-parallel ComputingComputer EngineeringNovel Micro-architecturesComputer ScienceGpu ArchitectureHardware AccelerationParallel ProgrammingModern Micro-architecturesData-level ParallelismTemporal Locality
Novel micro-architectures including the Cell Broadband Engine Architecture and graphics processing units are attractive platforms for compute-intensive simulations. This paper focuses on stencil computations arising in the context of a biomedical simulation and presents performance benchmarks on both the Cell BE and GPUs and contrasts them with a benchmark on a traditional CPU system. Due to the low arithmetic intensity of stencil computations, typically only a fraction of the peak performance of the compute hardware is reached. An algorithm is presented, which reduces the bandwidth requirements and thereby improves performance by exploiting temporal locality of the data. We report on performance improvements over CPU implementations.
| Year | Citations | |
|---|---|---|
Page 1
Page 1