Publication | Closed Access
ResNeSt: Split-Attention Networks
1.2K
Citations
69
References
2022
Year
Convolutional Neural NetworkEngineeringMachine LearningDeep Learning ModelsRecurrent Neural NetworkNatural Language ProcessingRicher Network RepresentationsData ScienceSparse Neural NetworkVisual Question AnsweringVideo TransformerMachine TranslationComputer EngineeringVision Language ModelComputer ScienceDeep LearningNeural Architecture SearchResnet ModelComputer VisionModel CompressionSplit-attention NetworksDeep Neural Networks
The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation‑learning in convolutional neural networks, we present a multi‑branch architecture that applies channel‑wise attention across different branches to leverage the complementary strengths of feature‑map attention and multi‑path representation. The Split‑Attention module is a modular, drop‑in replacement for residual blocks that applies channel‑wise attention across multiple branches, enabling cross‑feature interactions and richer representations. Incorporating the Split‑Attention module into RegNet‑Y, FBNetV2, and a new ResNeSt variant improves performance, with ResNeSt surpassing EfficientNet on the accuracy/latency trade‑off.
The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation-learning in convolutional neural networks, we present a multi-branch architecture, which applies channel-wise attention across different network branches to leverage the complementary strengths of both feature-map attention and multi-path representation. Our proposed Split-Attention module provides a simple and modular computation block that can serve as a drop-in replacement for the popular residual block, while producing more diverse representations via cross-feature interactions. Adding a Split-Attention module into the architecture design space of RegNet-Y and FBNetV2 directly improves the performance of the resulting network. Replacing residual blocks with our Split-Attention module, we further design a new variant of the ResNet model, named ResNeSt, which outperforms EfficientNet in terms of the accuracy/latency trade-off.
| Year | Citations | |
|---|---|---|
Page 1
Page 1