Publication | Closed Access
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
38
Citations
30
References
2006
Year
Unknown Venue
EngineeringComputer ArchitectureMemory Model (Programming)Software AnalysisAddress-indexed Memory DisambiguationHardware SecurityShared MemoryHigh-performance ArchitectureStore InstructionsParallel ComputingSuperscalar ProcessorsMemory ManagementData ManagementConventional Load/store QueueComputer EngineeringComputer ScienceMemory ArchitectureExternal-memory AlgorithmProgram AnalysisParallel Programming
This paper describes a scalable, low-complexity alternative to the conventional load/store queue (LSQ) for superscalar processors that execute load and store instructions speculatively and out-of-order prior to resolving their dependences. Whereas the LSQ requires associative and age-prioritized searches for each access, we propose that an address-indexed store-forwarding cache (SFC) perform store-to-load forwarding and that an address-indexed memory disambiguation table (MDT) perform memory disambiguation. Neither structure includes a CAM. The SFC behaves as a small cache, accessed speculatively and out-of-order by both loads and stores. Because the SFC does not rename in-flight stores to the same address, violations of memory anti and output dependences can cause in-flight loads to obtain incorrect values from the SFC. Therefore, the MDT uses sequence numbers to detect and recover from true, anti, and output memory dependence violations. We observe empirically that loads and stores that violate anti and output memory dependences are rarely on a program's critical path and that the additional cost of enforcing predicted anti and output dependences among these loads and stores is minimal. In conjunction with a scheduler that enforces predicted anti and output dependences, the MDT and SFC yield performance equivalent to that of a large LSQ that has similar or greater circuit complexity. The SFC and MDT are scalable structures that yield high performance and lower dynamic power consumption than the LSQ, and they are well-suited for checkpointed processors with large instruction windows
| Year | Citations | |
|---|---|---|
Page 1
Page 1