Global–Local 3-D Convolutional Transformer Network for Hyperspectral Image Classification

Abstract

Benefiting from powerful feature extraction capabilities, convolutional neural networks (CNNs) have gained prominence in hyperspectral image (HSI) classification. Nevertheless, with restricted receptive fields of convolution kernels, CNN-based methods fail to learn complex characteristics of long-range sequences. Meanwhile, vision transformer allows us to learn long-range dependencies in a global view, but local region features are ignored. To overcome these limitations, we propose a novel method entitled global-local three-dimensional convolutional transformer network (GTCT), where 3-D convolution is embedded in a dual-branch transformer to simultaneously capture global-local associations in both spectral and spatial domains. In particular, the global-local spectral convolutional transformer (GECT) is designed to exploit global spectral sequence signatures and local spectral relationships between bands. Symmetrically, the global-local spatial convolutional transformer (GACT) is devised to exploit local spatial context features and global interactions among different pixels. In addition, multiscale global-local spectral-spatial information is adaptively fused with trainable weights by the weighted multiscale spectral-spatial feature interaction (WMSFI) module. It is worth noting that a spectral-spatial global attention mechanism (SSGAM) is incorporated into multi-head convolutional attention to further integrate discriminative spectral-spatial information. Extensive experiments on four HSI datasets, including GF-5 and ZY1-02D satellite hyperspectral images, demonstrate the superiority of the proposed GTCT method over other state-of-the-art algorithms with fewer parameters and lower floating-point operations (FLOPs) in practical applications.

References

Page 1

	Year	Citations

Page 1