Concepedia

Publication | Closed Access

Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors

80

Citations

15

References

2002

Year

Abstract

Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many important applications. Unfortunately, known techniques for run-time parallelization are often computationally expensive or not general enough. To address this problem, we propose new hardware support for efficient run-time parallelization in distributed shared-memory (DSM) multiprocessors. The idea is to execute the code in parallel speculatively and use extensions to the cache coherence protocol hardware to detect any dependence violations. As soon as a dependence is detected, execution stops, the state is restored, and the code is re-executed serially. This scheme, which we apply to loops, allows iterations to execute and complete in potentially any order. This scheme requires hardware extensions to the cache coherence protocol and memory hierarchy of a DSM. It has low overhead. We present the algorithms and a hardware design of the scheme. Overall, the scheme delivers average loop speedups of 7.3 for 16 processors and is 50% faster than a related software-only method.

References

YearCitations

Page 1