Publication | Closed Access
An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective
22
Citations
14
References
2020
Year
Convolutional Neural NetworkEngineeringMachine LearningHardware AlgorithmComputer ArchitectureTernary CnnsImage AnalysisSparse Neural NetworkComputational ImagingParallel ComputingIrregular SparsityComputer EngineeringSparsity PerspectiveComputer ScienceDeep LearningComputer VisionHardware AccelerationConvolutional Neural NetworksDomain-specific Accelerator
Convolutional neural networks (CNNs) have emerged as one of the most popular ways applied in many fields. These networks deliver better performance when going deeper and larger. However, the complicated computation and huge storage impede hardware implementation. To address the problem, quantized networks are proposed. Besides, various convolutional structures are designed to meet the requirements of different applications. For example, compared with the traditional convolutions (CONVs) for image classification, CONVs for image generation are usually composed of traditional CONVs, dilated CONVs, and transposed CONVs, leading to a difficult hardware mapping problem. In this brief, we translate the difficult mapping problem into the sparsity problem and propose an efficient hardware architecture for sparse binary and ternary CNNs by exploiting the sparsity and low bit-width characteristics. To this end, we propose an ineffectual data removing (IDR) mechanism to remove both the regular and irregular sparsity based on dual-channel processing elements (PEs). Besides, a flexible layered load balance (LLB) mechanism is introduced to alleviate the load imbalance. The accelerator is implemented with 65-nm technology with a core size of 2.56 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . It can achieve 3.72-TOPS/W energy efficiency at 50.1 mW, which makes it a promising design for embedded devices.
| Year | Citations | |
|---|---|---|
Page 1
Page 1