Training and testing object detectors with virtual images

TLDR

Deep‑learning object detectors rely on massive labeled datasets, but collecting and annotating real‑world images is labor‑intensive, costly, and inflexible for specialized scenarios such as small objects or high occlusion. This study proposes a method to design artificial scenes and automatically generate virtual images with precise annotations for training object detectors. The authors built the ParallelEye virtual dataset, which can be used for training and testing multiple computer‑vision tasks, and investigated testing trained models on intentionally designed virtual datasets to expose their weaknesses. Training DPM and Faster R‑CNN detectors on ParallelEye combined with real datasets significantly improved performance, and experimental results confirm that the virtual dataset is viable for both training and testing object detectors.

Abstract

In the area of computer vision, deep learning has produced a variety of state-of-the-art models that rely on massive labeled data. However, collecting and annotating images from the real world is too demanding in terms of labor and money investments, and is usually inflexible to build datasets with specific characteristics, such as small area of objects and high occlusion level. Under the framework of Parallel Vision, this paper presents a purposeful way to design artificial scenes and automatically generate virtual images with precise annotations. A virtual dataset named ParallelEye is built, which can be used for several computer vision tasks. Then, by training the DPM U+0028 Deformable parts model U+0029 and Faster R-CNN detectors, we prove that the performance of models can be significantly improved by combining ParallelEye with publicly available real-world datasets during the training phase. In addition, we investigate the potential of testing the trained models from a specific aspect using intentionally designed virtual datasets, in order to discover the flaws of trained models. From the experimental results, we conclude that our virtual dataset is viable to train and test the object detectors.

References

Page 1

	Year	Citations

Page 1