Concepedia

Abstract

Most stencil computations allow tile-wise concurrent start, i.e., there always exists a face of the iteration space and a set of tiling directions such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. However, existing automatic tiling frameworks often choose hyperplanes that lead to pipelined start-up and load imbalance. We address this issue with a new tiling technique, called diamond tiling, that ensures concurrent start-up as well as perfect load-balance whenever possible. We first provide necessary and sufficient conditions for a set of tiling hyperplanes to allow concurrent start for programs with affine data accesses. We then provide an approach to automatically find such hyperplanes. Experimental evaluation on a 12-core Intel Westmere shows that diamond tiled code is able to outperform a tuned domain-specific stencil code generator by 10 to 40 percent, and previous compiler techniques by a factor of 1.3x to 10.1x.

References

YearCitations

Page 1