Publication | Closed Access
Cache and Bandwidth Aware Matrix Multiplication on the GPU
71
Citations
4
References
2003
Year
Unknown Venue
Recent AdvancesComputational ScienceGpu ArchitectureEngineeringArray ComputingGpu BenchmarkingGpu ClusterComputer EngineeringComputer ArchitectureParallel ProgrammingComputer ScienceGpu HardwareParallel ComputingComputational GeometryMemory BandwidthGpu Computing
Recent advances in the speed and programmability of consumer level graphics hardware has sparked a flurry of research that goes beyond the realm of image synthesis and computer graphics. We examine the use of the GPU (graphics processing unit) as a tool for scientific computing, by analyzing techniques for performing large matrix multiplies in GPU hardware. An earlier method for multiplying matrices on the GPU suffered from problems of memory bandwidth. This paper examines more efficient algorithms that make the implementation of large matrix multiplication on upcoming GPU architectures more competitive, using only 25% of the memory bandwidth and instructions of previous GPU algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1