Publication | Closed Access
Multi-Chain Prefetching: Effective Exploitation of Inter-Chain Memory Parallelism for Pointer-Chasing Codes
28
Citations
16
References
2001
Year
EngineeringComputer ArchitectureScheduling AlgorithmMulti-chain PrefetchingMemory Model (Programming)Hardware SecurityShared MemoryJump PointerHigh-performance ArchitectureParallel ComputingMemory ManagementInstruction-level ParallelismInter-chain Memory ParallelismComputer EngineeringComputer ScienceEffective ExploitationProgram AnalysisFormal MethodsParallel ProgrammingConcurrent Data StructureData-level ParallelismSystem Software
Abstract: Pointer-chasing applications tend to traverse composed data structures consisting of multiple independent pointer chains. While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism. This paper presents multi-chain prefetching, a technique that utilizes off-line analysis and a hardware prefetch engine to prefetch multiple independent pointer chains simultaneously, thus exploiting inter-chain memory parallelism for the purpose of memory latency tolerance. This paper makes three contributions. First, we introduce a scheduling algorithm that identifies independent pointer chains in pointer-chasing codes and computes a prefetch schedule that overlaps serialized cache misses across separate chains. Our analysis focuses on static traversals. We also propose using speculation to identify independent pointer chains in dynamic traversals. Second, we present the design of a prefetch engine that traverses pointer-based data structures and overlaps multiple pointer chains according to our scheduling algorithm. Finally, we conduct an experimental evaluation of multi-chain prefetching and compare its performance against two existing techniques, jump pointer prefetching [9] and prefetch arrays [6]. Our results show multi-chain prefetching improves execution time by 40% across six pointer-chasing kernels from the Olden benchmark suite [14], and by 8% across four SPECInt CPU2000 benchmarks. Multi-chain prefetching also outperforms jump pointer prefetching and prefetch arrays by 28% on Olden, and by 12% on SPECInt. Furthermore, speculation can enable multi-chain prefetching for some dynamic traversal codes, but our technique loses its effectiveness when the pointer-chain traversal order is unpredictable. Finally, we also show that combining multi-chain prefetching with prefetch arrays can potentially provide higher performance than either technique alone.
| Year | Citations | |
|---|---|---|
Page 1
Page 1