Publication | Open Access
Dual Encoder–Decoder Network for Land Cover Segmentation of Remote Sensing Image
40
Citations
43
References
2023
Year
Convolutional Neural NetworkEngineeringLand UseDecoding StrategiesMulti-image FusionLand CoverEarth ScienceSocial SciencesImage ClassificationImage AnalysisPattern RecognitionSemantic SegmentationLand Cover SegmentationRemote Sensing ImageMachine VisionObject DetectionGeographyDual Encoder–decoder NetworkDeep LearningFeature FusionComputer VisionLand Cover MapRemote SensingCover MappingEncoding StageDecoding Stage
Although the vision transformer-based methods (ViTs) exhibit excellent performance than convolutional neural networks (CNNs) for image recognition tasks, their pixel-level semantic segmentation ability is limited due to the lack of explicit utilization of local biases. Recently, a variety of hybrid structures of ViT and CNN have been proposed, but these methods have poor multi-scale fusion ability and cannot accurately segment high-resolution and high-content complex land cover remote sensing images. Therefore, a dual encoder-decoder network named DEDNet is proposed in this work. In the encoding stage, the local and global information of the image is extracted by parallel CNN encoder and Transformer encoder. In the decoding stage, the cross-stage fusion (CF) module is constructed to achieve neighborhood attention guidance to enhance the positioning of small targets, effectively avoiding intra-class inconsistency. At the same time, the multi-head feature extraction (MFE) module is proposed to strengthen the recognition ability of the target boundary and effectively avoid inter-class ambiguity. Before outputting, the fusion spatial pyramid pooling (FSPP) classifier is proposed to merge the outputs of the two decoding strategies. The experiments demonstrate that the proposed model has superior generalization performance and can handle various semantic segmentation tasks of land cover.
| Year | Citations | |
|---|---|---|
Page 1
Page 1