Online Video Object Segmentation via Convolutional Trident Network

TLDR

Propagation of segmentation labels via optical flow is error‑prone. The paper proposes a semi‑supervised online video object segmentation algorithm that takes user annotations on the first frame. The algorithm propagates labels with optical flow, then employs a convolutional trident network with three decoders (separative, definite foreground, definite background) followed by Markov random field optimization, applied sequentially from the second to the last frames to produce a target object segment track. Experimental results show the algorithm significantly outperforms state‑of‑the‑art methods on the DAVIS benchmark.

Abstract

A semi-supervised online video object segmentation algorithm, which accepts user annotations about a target object at the first frame, is proposed in this work. We propagate the segmentation labels at the previous frame to the current frame using optical flow vectors. However, the propagation is error-prone. Therefore, we develop the convolutional trident network (CTN), which has three decoding branches: separative, definite foreground, and definite background decoders. Then, we perform Markov random field optimization based on outputs of the three decoders. We sequentially carry out these processes from the second to the last frames to extract a segment track of the target object. Experimental results demonstrate that the proposed algorithm significantly outperforms the state-of-the-art conventional algorithms on the DAVIS benchmark dataset.

References

Page 1

	Year	Citations

Page 1