ResNeSt: Split-Attention Networks

TLDR

The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation‑learning in convolutional neural networks, we present a multi‑branch architecture that applies channel‑wise attention across different branches to leverage the complementary strengths of feature‑map attention and multi‑path representation. The Split‑Attention module is a modular, drop‑in replacement for residual blocks that applies channel‑wise attention across multiple branches, enabling cross‑feature interactions and richer representations. Incorporating the Split‑Attention module into RegNet‑Y, FBNetV2, and a new ResNeSt variant improves performance, with ResNeSt surpassing EfficientNet on the accuracy/latency trade‑off.

Abstract

The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation-learning in convolutional neural networks, we present a multi-branch architecture, which applies channel-wise attention across different network branches to leverage the complementary strengths of both feature-map attention and multi-path representation. Our proposed Split-Attention module provides a simple and modular computation block that can serve as a drop-in replacement for the popular residual block, while producing more diverse representations via cross-feature interactions. Adding a Split-Attention module into the architecture design space of RegNet-Y and FBNetV2 directly improves the performance of the resulting network. Replacing residual blocks with our Split-Attention module, we further design a new variant of the ResNet model, named ResNeSt, which outperforms EfficientNet in terms of the accuracy/latency trade-off.

References

Page 1

	Year	Citations

Page 1