Remote Sensing Image Scene Classification Using Multiscale Feature Fusion Covariance Network With Octave Convolution

Abstract

In remote sensing scene classification (RSSC), features can be extracted with different spatial frequencies where high-frequency features usually represent detailed information and low-frequency features usually represent global structures. However, it is challenging to extract meaningful semantic information for RSSC tasks by just utilizing high- or low-frequency features. The spatial composition of remote sensing images (RSIs) is more complex than that of natural images, and the scales of objects vary significantly. In this article, a multiscale feature fusion covariance network (MF2CNet) with octave convolution (Oct Conv) is proposed, which can extract multifrequency and multiscale features from RSIs. First, the multifrequency feature extraction (MFE) module is used to obtain fine-grained frequency features by Oct Conv. Then, the features of different layers in MF2CNet are fused by the multiscale feature fusion (MF2) module. Finally, instead of using global average pooling (GAP), global covariance pooling (GCP) extracts high-order information from RSIs to capture richer statistics of deep features. In the proposed MF2CNet, the obtained multifrequency and multiscale features can effectively improve the performance of CNNs. Experimental results on four public RSI datasets show that MF2CNet has advantages in RSSC over current state-of-the-art methods. The source codes of this method can be found at <uri>https://github.com/liuqingxin-chd/MF2CNet</uri>.

References

Page 1

	Year	Citations

Page 1