Publication | Closed Access
Effective stream-based and execution-based data prefetching
78
Citations
28
References
2004
Year
Unknown Venue
Cluster ComputingExecution-based Data PrefetchingEngineeringComputer ArchitectureData Streaming ArchitectureProcessor ArchitectureHigh-performance ArchitectureSystem SoftwareParallel ComputingManycore ProcessorData ManagementStreaming EngineComputer EngineeringComputer ScienceData Stream ManagementMemory ArchitectureProcessor SpeedsCache-missing LoadsParallel ProgrammingSpec Cpu2000
With processor speeds continuing to outpace the memory subsystem, cache missing memory operations continue to become increasingly important to application performance. In response to this continuing trend, most modern processors now support hardware (HW) prefetchers, which act to reduce the missing loads observed by an application.This paper analyzes the behavior of cache-missing loads in SPEC CPU2000 and highlights the inability of unit and single non-unit stride prefetchers to correctly prefetch for some commonly occurring streams. In response to this analysis, a novel multi-stride prefetcher, that supports streams with up to four distinct strides, is proposed. Performance analysis for SPEC CPU2000 illustrates that the proposed multi-stride prefetcher can outperform current stride prefetchers on several benchmarks; most notably on mcf, lucas and facerec, where it achieves an additional performance gain of up to 57%. Performance of the strided HW prefetchers is also contrasted with another recently proposed prefetch scheme, runahead execution (RAE), and the synergy between the schemes is investigated.
| Year | Citations | |
|---|---|---|
Page 1
Page 1