Pano-SfMLearner: Self-Supervised Multi-Task Learning of Depth and Semantics in Panoramic Videos

Abstract

With the advent of virtual reality and augment reality applications, omnidirectional imaging and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$360^{\circ }$</tex-math></inline-formula> cameras become increasingly popular in many scenarios such as entertainment and autonomous systems. In this paper, we propose a self-supervised framework for multi-task learning on depth, camera motion and semantics from panoramic videos. Specifically, our method is based on differentiable warping of adjacent views to the target. Two improvements are provided. First, we introduce a view synthesis module based on equirectangular projection to enable direct optimization on panoramic images. Second, we introduce a self-supervised segmentation branch to involve the constraint of semantic consistency for further improvement. Extensive experiments on two <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$360^{\circ }$</tex-math></inline-formula> video and two <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$360^{\circ }$</tex-math></inline-formula> image datasets demonstrate that our method outperforms the state-of-the-art and achieves favorable cross-modality performance.

References

Page 1

	Year	Citations

Page 1