Video Instance Segmentation with a Propose-Reduce Paradigm

TLDR

Prior methods typically segment individual frames or clips first and then merge incomplete results through tracking or matching. The study proposes a Propose‑Reduce paradigm that generates complete instance sequences for videos in a single step. The method builds a sequence propagation head on an image‑level instance segmentation network, proposes multiple sequences, and reduces redundant sequences to achieve robust, high‑recall long‑term propagation. The approach attains state‑of‑the‑art results, achieving 47.6 % AP on YouTube‑VIS and 70.4 % on DAVIS‑UVOS validation sets.

Abstract

Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain segmentation for a frame or clip first, and merge the incomplete results by tracking or matching. These methods may cause error accumulation in the merging step. Contrarily, we propose a new paradigm – Propose-Reduce, to generate complete sequences for input videos by a single step. We further build a sequence propagation head on the existing image-level instance segmentation network for long-term propagation. To ensure robustness and high recall of our proposed framework, multiple sequences are proposed where redundant sequences of the same instance are reduced. We achieve state-of-the-art performance on two representative benchmark datasets – we obtain 47.6% in terms of AP on YouTube-VIS validation set and 70.4 % for J&F on DAVIS-UVOS validation set.

References

Page 1

	Year	Citations

Page 1