Pyramid Scene Parsing Network

TLDR

Scene parsing is challenging due to unrestricted vocabulary and diverse scenes. The paper proposes using a pyramid pooling module and PSPNet to exploit global context for scene parsing. PSPNet incorporates a global prior representation and pyramid pooling to achieve high‑quality pixel‑level predictions. The method achieves state‑of‑the‑art results, winning the 2016 ImageNet scene parsing challenge, PASCAL VOC 2012, and Cityscapes, with mIoU of 85.4 % on PASCAL VOC 2012 and 80.2 % on Cityscapes.

Abstract

Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

References

Page 1

	Year	Citations

Page 1