Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach

TLDR

Synthetic data generated from virtual 3D environments is increasingly used to train semantic segmentation models, providing an alternative to manual pixel‑wise annotation. The study proposes a cross‑domain semantic segmentation approach that leverages auxiliary geometric information readily available from virtual environments. Geometry is incorporated at both input and output levels—augmenting the image translation network to produce realistic style and building a joint segmentation–depth task network—while adversarial training on the combined output preserves semantic–depth correlation. Experiments on Virtual KITTI→KITTI and SYNTHIA→Cityscapes show the method outperforms baselines and competing methods, confirming the benefit of geometric guidance for cross‑domain segmentation.

Abstract

As an alternative to manual pixel-wise annotation, synthetic data has been increasingly used for training semantic segmentation models. Such synthetic images and semantic labels can be easily generated from virtual 3D environments. In this work, we propose an approach to cross-domain semantic segmentation with the auxiliary geometric information, which can also be easily obtained from virtual environments. The geometric information is utilized on two levels for reducing domain shift: on the input level, we augment the standard image translation network with the geometric information to translate synthetic images into realistic style; on the output level, we build a task network which simultaneously performs semantic segmentation and depth estimation. Meanwhile, adversarial training is applied on the joint output space to preserve the correlation between semantics and depth. The proposed approach is validated on two pairs of synthetic to real dataset: from Virtual KITTI to KITTI, and from SYNTHIA to Cityscapes, where we achieve a clear performance gain compared to the baselines and various competing methods, demonstrating the effectiveness of the geometric information for cross-domain semantic segmentation.

References

Page 1

	Year	Citations

Page 1