Publication | Closed Access
Context Encoding for Semantic Segmentation
1.5K
Citations
50
References
2018
Year
Unknown Venue
Convolutional Neural NetworkScene AnalysisEngineeringMachine LearningImage AnalysisData SciencePattern RecognitionSemantic SegmentationSpatial ResolutionVideo TransformerMachine VisionComputer ScienceMedical Image ComputingDeep LearningComputer VisionFully Convolutional NetworkScene InterpretationContext Encoding ModuleScene UnderstandingContext Encoding
Recent work has improved spatial resolution for pixelwise labeling in Fully Convolutional Networks by using dilated convolutions, multi‑scale features, and boundary refinement. This study investigates how a Context Encoding Module that captures global scene context and selectively emphasizes class‑dependent feature maps can enhance semantic segmentation and improve shallow network representations for image classification. The module encodes scene semantics, highlights class‑specific feature maps, and is implemented within the FCN framework, with the full source code publicly released. The module yields state‑of‑the‑art results, achieving 51.7 % mIoU on PASCAL‑Context, 85.9 % on PASCAL‑VOC 2012, 0.5567 on ADE20K, and a 3.45 % error rate on CIFAR‑10, outperforming prior methods.
Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017. In addition, we also explore how the Context Encoding Module can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Our 14 layer network has achieved an error rate of 3.45%, which is comparable with state-of-the-art approaches with over 10Ã- more layers. The source code for the complete system are publicly available1.
| Year | Citations | |
|---|---|---|
Page 1
Page 1