A portable, automatic data qantizer for deep neural networks

Abstract

With the proliferation of AI-based applications and services, there are strong demands for efficient processing of deep neural networks (DNNs). DNNs are known to be both compute-and memory-intensive as they require a tremendous amount of computation and large memory space. Quantization is a popular technique to boost efficiency of DNNs by representing a number with fewer bits, hence reducing both computational strength and memory footprint. However, it is a difficult task to find an optimal number representation for a DNN due to a combinatorial explosion in feasible number representations with varying bit widths, which is only exacerbated by layer-wise optimization. Besides, existing quantization techniques often target a specific DNN framework and/or hardware platform, lacking portability across various execution environments. To address this, we propose libnumber, a portable, automatic quantization framework for DNNs. By introducing Number abstract data type (ADT), libnumber encapsulates the internal representation of a number from the user. Then the auto-tuner of libnumber finds a compact representation (type, bit width, and bias) for the number that minimizes the user-supplied objective function, while satisfying the accuracy constraint. Thus, libnumber effectively separates the concern of developing an effective DNN model from low-level optimization of number representation. Our evaluation using eleven DNN models on two DNN frameworks targeting an FPGA platform demonstrates over 8× (7×) reduction in the parameter size on average when up to 7% (1%) loss of relative accuracy is tolerable, with a maximum reduction of 16×, compared to the baseline using 32-bit floating-point numbers. This leads to an geomean speedup of 3.79× with a maximum speedup of 12.77× over the baseline, while requiring only minimal programmer effort.

References

Page 1

	Year	Citations

Page 1