BRECQ: Pushing the Limit of Post-Training Quantization by Block\n Reconstruction

Abstract

We study the challenging task of neural network quantization without\nend-to-end retraining, called Post-training Quantization (PTQ). PTQ usually\nrequires a small subset of training data but produces less powerful quantized\nmodels than Quantization-Aware Training (QAT). In this work, we propose a novel\nPTQ framework, dubbed BRECQ, which pushes the limits of bitwidth in PTQ down to\nINT2 for the first time. BRECQ leverages the basic building blocks in neural\nnetworks and reconstructs them one-by-one. In a comprehensive theoretical study\nof the second-order error, we show that BRECQ achieves a good balance between\ncross-layer dependency and generalization error. To further employ the power of\nquantization, the mixed precision technique is incorporated in our framework by\napproximating the inter-layer and intra-layer sensitivity. Extensive\nexperiments on various handcrafted and searched neural architectures are\nconducted for both image classification and object detection tasks. And for the\nfirst time we prove that, without bells and whistles, PTQ can attain 4-bit\nResNet and MobileNetV2 comparable with QAT and enjoy 240 times faster\nproduction of quantized models. Codes are available at\nhttps://github.com/yhhhli/BRECQ.\n

References

Page 1

	Year	Citations

Page 1