Publication | Open Access
TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory
60
Citations
53
References
2021
Year
Unknown Venue
EngineeringMachine LearningComputer ArchitectureInformation RetrievalData ScienceData MiningHigh-performance ArchitectureRecommendation SystemsMemory ThroughputParallel ComputingAbundant Transfer ThroughputComputer EngineeringScalable Tensor ReductionComputer ScienceData-intensive ComputingMemory ArchitectureExternal-memory AlgorithmHardware AccelerationMany-core ArchitectureParallel ProgrammingSimilarity SearchCollaborative FilteringIn-memory ComputingVectorization
Personalized recommendation systems are gaining significant traction due to their industrial importance. An important building block of recommendation systems consists of the embedding layers, which exhibit a highly memory-intensive characteristic. A fundamental primitive of embedding layers is the embedding vector gathers followed by vector reductions, exhibiting low arithmetic intensity and becoming bottlenecked by the memory throughput. To tackle such a challenge, recent proposals employ a near-data processing (NDP) solution at the DRAM rank-level, achieving impressive performance speedups. We observe that prior rank-level-parallelism-based NDP solutions leave significant performance potential on the table as they do not fully reap the abundant transfer throughput inherent in DRAM datapaths.
| Year | Citations | |
|---|---|---|
Page 1
Page 1