Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

TLDR

The LRU policy implicitly partitions a shared cache on demand, yet higher demand does not always translate into greater performance, so allocating cache based on potential benefit is preferable. This paper investigates how to partition a shared cache among concurrently running applications. The authors propose a low‑overhead, runtime utility‑based cache partitioning mechanism that monitors each application with a <2 kB hardware circuit and uses the collected data to allocate cache resources according to expected miss reduction. Evaluation on 20 multiprogrammed workloads shows that UCP boosts dual‑core performance by up to 23 % and on average 11 % over LRU‑based partitioning.

Abstract

This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache resources to the application that has a low demand. However, a higher demand for cache resources does not always correlate with a higher performance from additional cache resources. It is beneficial for performance to invest cache resources in the application that benefits more from the cache resources rather than in the application that has more demand for the cache resources. This paper proposes utility-based cache partitioning (UCP), a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources. The proposed mechanism monitors each application at runtime using a novel, cost-effective, hardware circuit that requires less than 2kB of storage. The information collected by the monitoring circuits is used by a partitioning algorithm to decide the amount of cache resources allocated to each application. Our evaluation, with 20 multiprogrammed workloads, shows that UCP improves performance of a dual-core system by up to 23% and on average 11% over LRU-based cache partitioning

References

Page 1

	Year	Citations

Page 1