Publication | Closed Access
Store Memory-Level Parallelism Optimizations for Commercial Applications
39
Citations
41
References
2006
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureProcessor ArchitectureProcessor PerformanceHardware SecurityParallel SoftwareHigh-performance ArchitectureParallel ComputingManycore ProcessorMassively-parallel ComputingComputer EngineeringComputer ScienceMicroelectronicsMemory ArchitectureHardware ScoutEpoch Mlp ModelParallel ProgrammingMemory-level Parallelism OptimizationsData-level Parallelism
This paper studies the impact of off-chip store misses on processor performance for modern commercial applications. The performance impact of off-chip store misses is largely determined by the extent of their overlap with other off-chip cache misses. The epoch MLP model is used to explain and quantify how these overlaps are affected by various store handling optimizations and by the memory consistency model implemented by the processor. The extent of these overlaps is then translated to off-chip CPI. Experimental results show that store handling optimizations are crucial for mitigating the substantial performance impact of stores in commercial applications. While some previously proposed optimizations, such as store prefetching, are highly effective, they are unable to fully mitigate the performance impact of off-chip store misses and they also leave a performance gap between the stronger and weaker memory consistency models. New optimizations, such as the store miss accelerator, an optimization of hardware scout and a new application of speculative lock elision, are demonstrated to virtually eliminate the impact of off-chip store misses.
| Year | Citations | |
|---|---|---|
Page 1
Page 1