Video Segmentation by Tracking Many Figure-Ground Segments

TLDR

The authors propose an unsupervised video segmentation method that simultaneously tracks hundreds of holistic figure‑ground segments and refines them with a composite statistical inference approach. They initialize tracks from figure‑ground proposals, train online non‑local appearance models via multi‑output regularized least squares, and use a shared training set to update all tracks efficiently in closed form while enforcing temporal consistency. On the SegTrack v2 dataset, the framework outperforms state‑of‑the‑art methods, demonstrating superior efficiency and robustness across diverse video sequences.

Abstract

We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figure-ground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing high-order statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, showing its efficiency and robustness to challenges in different video sequences.

References

Page 1

	Year	Citations

Page 1