Publication | Closed Access
Thread-Shared Software Code Caches
36
Citations
38
References
2006
Year
Unknown Venue
Software MaintenanceEngineeringComputer ArchitectureSoftware EngineeringSoftware AnalysisShared MemoryThread-shared Code CachesCompilersParallel ComputingWeb CacheCode CachesConcurrent ProgrammingComputer EngineeringCachingComputer ScienceProgram AnalysisSoftware TestingCloud ComputingParallel ProgrammingSoftware Code CachesSystem Software
Software code caches are increasingly being used to amortize the runtime overhead of dynamic optimizers, simulators, emulators, dynamic translators, dynamic compilers, and other tools. Despite the now-wide spread use of code caches, techniques for efficiently sharing them across multiple threads have not been fully explored. Some systems simply do not support threads, while others resort to thread-private code caches. Although thread-private caches are much simpler to manage, synchronize, and provide scratch space for, they simply do not scale when applied to many-threaded programs. Thread-shared code caches are needed to target server applications, which employ hundreds of worker threads all performing similar tasks. Yet, those systems that do share their code caches often have brute-force, inefficient solutions to the challenges of concurrent code cache access: a single global lock on runtime system code and suspension of all threads for any cache management action. This limits the possibilities for cache design and has performance problems with applications that require frequent cache invalidations to maintain cache consistency. In this paper, we discuss the design choices when building thread-shared code caches and enumerate the difficulties of thread-local storage, synchronization, trace building, in-cache lookup tables, and cache eviction. We present efficient solutions to these problems that both scale well and do not require thread suspension. We evaluate our results in DynamoRIO, an industrial-strength dynamic binary translation system, on real-world server applications. On these applications our thread-shared caches use an order of magnitude less memory and improve throughput by up to four times compared to thread-private caches.
| Year | Citations | |
|---|---|---|
Page 1
Page 1