Publication | Open Access
BSTC
42
Citations
62
References
2019
Year
Unknown Venue
EngineeringMachine LearningBinarized Neural NetworksHardware AccelerationEdge ComputingComputer ArchitectureComputer EngineeringBnn Network DesignParallel ProgrammingComputer ScienceParallel ComputingDeep LearningGpu ClusterBnn InferenceGpu Computing
Binarized neural networks (or BNNs) promise tremendous performance improvement over traditional DNNs through simplified bit-level computation and significantly reduced memory access/storage cost. In addition, it has advantages of low-cost, low-energy, and high-robustness, showing great potential in resources-constrained, volatile, and latency-critical applications, which are critical for future HPC, cloud, and edge applications. However, the promised significant performance gain of BNN inference has never been fully demonstrated on general-purpose processors, particularly on GPUs, due to: (i) the challenge of extracting and leveraging sufficient finegrained bit-level-parallelism to saturate GPU cores when the batch size is small; (ii) the fundamental design conflict between bit-based BNN algorithm and word-based architecture; and (iii) architecture & performance unfriendly to BNN network design. To address (i) and (ii), we propose a binarized-soft-tensor-core as a software-hardware codesign approach to construct bit-manipulation capability for modern GPUs and thereby effectively harvest bit-level-parallelism (BLP). To tackle (iii), we propose intra- and inter-layer fusion techniques so that the entire BNN inference execution can be packed into a single GPU kernel, and so avoid the high-cost of frequent launching and releasing. Experiments show that our Singular-Binarized-Neural-Network (SBNN) design can achieve over 1000X speedup for raw inference latency over the state-of-the-art full-precision BNN inference for AlexNet on GPUs. Comparisons with CPU, GPU, FPGA and Xeon-Phi demonstrate the effectiveness of our design. SBNN is opensourced and available at https://github.com/uuudown/SBNN.
| Year | Citations | |
|---|---|---|
Page 1
Page 1