Multilevel Attention-Based Sample Correlations for Knowledge Distillation

Abstract

Recently, model compression has been widely used for the deployment of cumbersome deep models on resource-limited edge devices in the performance-demanding industrial Internet of Things (IoT) scenarios. As a simple yet effective model compression technique, knowledge distillation (KD) aims to transfer the knowledge (e.g., sample relationships as the relational knowledge) from a large teacher model to a small student model. However, existing relational KD methods usually build sample correlations directly from the feature maps at a certain middle layer in deep neural networks, which tends to overfit the feature maps of the teacher model and fails to address the most important sample regions. Inspired by this, we argue that the characteristics of important regions are of great importance, and thus, introduce attention maps to construct sample correlations for knowledge distillation. Specifically, with attention maps from multiple middle layers, attention-based sample correlations are newly built upon the most informative sample regions, and can be used as an effective and novel relational knowledge for knowledge distillation. We refer to the proposed method as multilevel attention-based sample correlations for knowledge distillation (or MASCKD). We perform extensive experiments on popular KD datasets for image classification, image retrieval, and person reidentification, where the experimental results demonstrate the effectiveness of the proposed method for relational KD.

References

Page 1

	Year	Citations

Page 1