Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer

TLDR

Attention is essential for human visual perception and has recently been shown to enhance performance in artificial neural networks across computer vision and NLP tasks. The study demonstrates that defining attention for convolutional neural networks and forcing a student network to mimic a teacher’s attention maps can markedly improve its performance. The authors introduce several novel attention‑transfer techniques and provide open‑source code, achieving consistent gains across multiple datasets and CNN architectures. These methods yield significant performance improvements for student CNNs compared to baselines.

Abstract

Attention plays a critical role in human visual experience. Furthermore, it has recently been demonstrated that attention can also play an important role in the context of applying artificial neural networks to a variety of tasks from fields such as computer vision and NLP. In this work we show that, by properly defining attention for convolutional neural networks, we can actually use this type of information in order to significantly improve the performance of a student CNN network by forcing it to mimic the attention maps of a powerful teacher network. To that end, we propose several novel methods of transferring attention, showing consistent improvement across a variety of datasets and convolutional neural network architectures. Code and models for our experiments are available at https://github.com/szagoruyko/attention-transfer