Publication | Closed Access
Shared memory multiplexing
57
Citations
12
References
2012
Year
Unknown Venue
Available Thread-level ParallelismThread BlocksGpu ArchitectureEngineeringMultiplexingShared MemoryMany-core ArchitectureComputer EngineeringComputer ArchitectureParallel ProgrammingComputer ScienceThread BlockShared Memory MultiplexingParallel ComputingGpu ClusterGpu ComputingMulti-channel Memory Architecture
On-chip shared memory (a.k.a. local data share) is a critical resource to many GPGPU applications. In current GPUs, the shared memory is allocated when a thread block (also called a workgroup) is dispatched to a streaming multiprocessor (SM) and is released when the thread block is completed. As a result, the limited capacity of shared memory becomes a bottleneck for a GPU to host a high number of thread blocks, limiting the otherwise available thread-level parallelism (TLP). In this paper, we propose software and/or hardware approaches to multiplex the shared memory among multiple thread blocks.
| Year | Citations | |
|---|---|---|
Page 1
Page 1