Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

TLDR

Recent advances in semantic image segmentation have largely been driven by training deep convolutional neural networks (CNNs). The authors aim to enhance segmentation by exploiting contextual relationships, specifically patch‑patch and patch‑background interactions. They formulate conditional random fields with CNN‑based pairwise potentials to capture patch‑patch correlations, train them efficiently with piecewise learning to avoid costly inference, and use a multi‑scale image input with sliding pyramid pooling to model patch‑background context. The resulting model sets new state‑of‑the‑art performance on NYUDv2, PASCAL VOC 2012, PASCAL‑Context, and SIFT‑flow, achieving an intersection‑over‑union of 78.0 on the challenging PASCAL VOC 2012 dataset.

Abstract

Recent advances in semantic image segmentation have mostly been achieved by training deep convolutional neural networks (CNNs). We show how to improve semantic segmentation through the use of contextual information, specifically, we explore 'patch-patch' context between image regions, and 'patch-background' context. For learning from the patch-patch context, we formulate Conditional Random Fields (CRFs) with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied to avoid repeated expensive CRF inference for back propagation. For capturing the patch-background context, we show that a network design with traditional multi-scale image input and sliding pyramid pooling is effective for improving performance. Our experimental results set new state-of-the-art performance on a number of popular semantic segmentation datasets, including NYUDv2, PASCAL VOC 2012, PASCAL-Context, and SIFT-flow. In particular, we achieve an intersection-overunion score of 78:0 on the challenging PASCAL VOC 2012 dataset.

References

Page 1

	Year	Citations

Page 1