Publication | Closed Access
RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation
3.2K
Citations
38
References
2017
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningConvolution StridingImage AnalysisData ScienceMulti-path Refinement NetworksPresent RefinenetSemantic SegmentationVideo TransformerMachine VisionFeature LearningIdentity Mapping MindsetObject DetectionVision Language ModelComputer ScienceDeep LearningComputer VisionScene InterpretationScene UnderstandingImage Segmentation
Deep convolutional neural networks excel at object recognition but suffer from resolution loss due to repeated subsampling operations such as pooling or striding. This work introduces RefineNet, a generic multi‑path refinement network that leverages all intermediate features from the down‑sampling process to produce high‑resolution predictions via long‑range residual connections. RefineNet refines high‑level semantic features with fine‑grained earlier features through identity‑mapping residual connections and incorporates chained residual pooling to efficiently capture rich background context. Experiments on seven public datasets demonstrate state‑of‑the‑art performance, achieving an 83.4 % intersection‑over‑union on PASCAL VOC 2012, the best result reported to date.
Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense classification problems such as semantic segmentation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments and set new state-of-the-art results on seven public datasets. In particular, we achieve an intersection-over-union score of 83.4 on the challenging PASCAL VOC 2012 dataset, which is the best reported result to date.
| Year | Citations | |
|---|---|---|
Page 1
Page 1