Concepedia

Abstract

Recently, deep neural networks have achieved remarkable performance in single-image localisation, where the location and orientation of the camera is estimated using an independent image. The main bottleneck is the requirement of large volumes of annotated data that is usually generated using structure-from-motion approaches. In this work, we demonstrate that convolutional neural networks (CNN) can learn from synthetic images to perform the task of single-image localisation of real images, where the synthetic images are rendered from texture-less 3D models. We represent both real and synthetic images as either segmented images, hierarchical edge maps, or a combination of both to perform the proposed domain adaptation. This adaptation, therefore, eliminates the need of real annotated images with ground truth camera poses that is otherwise obtained by using structure-from-motion methods. Comprehensive experimentation shows that an improvement of 66% can be achieved by the adaptation as compared to the baseline experiments without using adaptation.

References

YearCitations

Page 1