Publication | Closed Access
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference
39
Citations
0
References
2020
Year
Unknown Venue
Artificial IntelligenceRobust Fp16 TrainingEngineeringAdvanced ComputingHardware AlgorithmComputer ArchitectureParallel ComputingManycore ProcessorScalable Processor CoreXeon PhiComputer EngineeringComputer ScienceTflops 0.62VFpga DesignAi TrainingHardware AccelerationMany-core ArchitectureProcessor CoreParallel Programming
A processor core is presented for AI training and inference products. Leading-edge compute efficiency is achieved for robust fp16 training via efficient heterogeneous 2-D systolic array-SIMD compute engines leveraging compact DLFloat16 FPUs. Architectural flexibility is maintained for very high compute utilization across neural network topologies. A modular dual-corelet architecture with a shared scratchpad and a software-controlled network/memory interface enables scalability to many-core SoCs and large-scale systems. The 14nm AI core achieves fp16 peak performance of 3.0 TFLOPS at 0.62V and 1.4 TFLOPS/W at 0.54V.