Publication | Closed Access
OmniDRL: A 29.3 TFLOPS/W Deep Reinforcement Learning Processor with Dualmode Weight Compression and On-chip Sparse Weight Transposer
16
Citations
4
References
2021
Year
Unknown Venue
EngineeringMachine LearningComputer ArchitectureEducationReinforcement Learning (Educational Psychology)Learning ControlLifelong Reinforcement LearningTflops/w Drl ProcessorDrl TrainingHigh-performance ArchitectureSparse Neural NetworkComputing SystemsParallel ComputingPerformance ImprovementExponent Mean DeltaDualmode Weight CompressionComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchHardware AccelerationDeep Reinforcement LearningDomain-specific AcceleratorRobotics
This paper presents OmniDRL, a 4.18 TFLOPS and 29.3 TFLOPS/W DRL processor. A group-sparse training core and exponent mean delta encoding are proposed to enable weight and feature map compression for every iteration of DRL training. A sparse weight transposer enables on-chip transpose of compressed weight for reducing external memory access. The processor fabricated in 28 nm CMOS technology and occupies 3.6×3.6 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> die area. It achieved 7.16 TFLOPS/W energy efficiency for training robot agent (Mujoco Halfcheetah, TD3), which is 2.4× higher than the previous state-of-the-art.
| Year | Citations | |
|---|---|---|
Page 1
Page 1