Publication | Closed Access
Practical challenges in delivering the promises of real processing-in-memory machines
32
Citations
13
References
2018
Year
Unknown Venue
EngineeringMemory DesignEmerging Memory TechnologyComputer ArchitectureHardware SystemsParallel AlgorithmsMulti-channel Memory ArchitectureHigh-performance ArchitectureComputing SystemsAdaptive MemoryMemory DevicesParallel ComputingElectrical EngineeringComputer EngineeringComputer ScienceMemory ArchitecturePim MachinePractical ChallengesVon Neumann BottleneckParallel ProgrammingAmple ParallelismIn-memory DatabaseIn-memory Computing
Processing-in-Memory (PiM) machines promise to overcome the von Neumann bottleneck in order to further scale performance and energy efficiency of computing systems by reducing the extent of data transfer and offering ample parallelism. In this paper, we take the memristive Memory Processing Unit (mMPU) as a case study of a PiM machine and scrutinize it in practical scenarios. Specifically, we explore the limitations of parallelism and data transfer elimination. We argue that lack of operand locality and arrangement might make data transfer inevitable in the mMPU. We then devise techniques to move data within the mMPU, without transferring it off-chip, and quantify their costs. Additionally, we present electrical parameters that might limit the parallelism offered by the mMPU and evaluate their impact. Using benchmarks from the LGsynth91 suite, their vector extensions, and a few synthetic data-parallel workloads, we show that the internal data transfer results in an increase of up to 1.5× in the execution time, while the parallelism can be limited in some cases to 256 gates, resulting in an increase in execution time by 1.1× to 2×.
| Year | Citations | |
|---|---|---|
Page 1
Page 1