Concepedia

TLDR

Writing parallel applications for computational grids is challenging, requiring adaptation of local‑area‑network algorithms to account for varying link speeds, especially for collective operations like broadcast and reduce. MAGPIE’s algorithms minimize data sent over slow wide‑area links and incur only a single wide‑area latency. MAGPIE, a library of optimized collective communication operations, enables unmodified MPI applications to run on geographically distributed systems, achieving up to tenfold speedups over MPICH on moderate cluster sizes with 10 ms latency and 1 MB/s bandwidth, improving application kernels by up to four times, and its performance advantage grows with higher wide‑area latencies.

Abstract

Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIE's algorithms send the minimal amount of data over the slow wide area links, and only incur a single wide area latency. Using our system, existing MPI applications can be run unmodified on geographically distributed systems. On moderate cluster sizes, using a wide area latency of 10 milliseconds and a bandwidth of 1 MByte/s, MAGPIE executes operations up to 10 times faster than MPICH, a widely used MPI implementation; application kernels improve by up to a factor of 4. Due to the structure of our algorithms, MAGPIE's advantage increases for higher wide area latencies.

References

YearCitations

Page 1