Publication | Closed Access
STM2: A Parallel STM for High Performance Simultaneous Multithreading Systems
15
Citations
30
References
2011
Year
Unknown Venue
EngineeringAuxiliary ThreadsComputer ArchitectureMultithreading (Computer Architecture)Memory Model (Programming)Hardware SystemsParallel AlgorithmsModern Chip MultithreadingSystems EngineeringParallel ComputingCompilersLarge Cmt SystemsConcurrent ProgrammingComputer EngineeringDistributed SystemsComputer ScienceParallel StmProgram AnalysisParallel ProcessingParallel Performance EvaluationParallel ProgrammingAsynchronous SystemsSystem SoftwareTransactional Memory
Extracting high performance from modern chip multithreading (CMT) processors is a complex task, especially for large CMT systems. Programmers must efficiently parallelize performance-critical software while avoiding deadlocks and race conditions. Transactional memory (TM) is a promising programming model that allows programmers to focus on parallelism rather than maintaining correctness and avoiding deadlock. Software-only implementations (STMs) are especially compelling because they run on commodity hardware, therefore providing high portability. Unfortunately, STM systems usually suffer from high overheads, which may limit their usage especially at scale. In this paper we present STM <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> , a novel parallel STM designed for high performance, aggressive multithreading systems. STM <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> significantly lowers runtime overhead by offloading read-set validation, bookkeeping and conflict detection to auxiliary threads running on sibling hardware threads. Auxiliary threads perform STM operations in parallel with their paired application threads and absorb STM overhead, significantly improving performance. We exploit the fact that, on modern multi-core processors, sets of cores can share L1 or L2 caches. This lets us achieve closer coupling between the application thread and the auxiliary thread (when compared with a traditional multi-processor systems). Our results, performed on an IBM POWER7 machine, a state-of-the-art, aggressive multi-threaded system, show that our approach outperforms several well-known STM implementations. In particular, STM <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> shows speedups between 1.8x and 5.2x over the tested STM systems, on average, with peaks up to 12.8x.
| Year | Citations | |
|---|---|---|
Page 1
Page 1