Concepedia

Publication | Closed Access

A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication

87

Citations

10

References

1994

Year

Abstract

In this paper, we propose a scheme for matrix-matrix multiplication on a distributed-memory parallel computer. The scheme hides almost all of the communication cost with the computation and uses the standard, optimized Level-3 BLAS operation on each node. As a result, the overall performance of the scheme is nearly equal to the performance of the Level-3 optimized BLAS operation times the number of nodes in the computer, which is the peak performance obtainable for parallel BLAS. Another feature of our algorithm is that it can give peak performance for larger matrices, even if the underlying communication network of the computer is slow.

References

YearCitations

Page 1