Hierarchical Scene Coordinate Classification and Regression for Visual\n Localization

Abstract

Visual localization is critical to many applications in computer vision and\nrobotics. To address single-image RGB localization, state-of-the-art\nfeature-based methods match local descriptors between a query image and a\npre-built 3D model. Recently, deep neural networks have been exploited to\nregress the mapping between raw pixels and 3D coordinates in the scene, and\nthus the matching is implicitly performed by the forward pass through the\nnetwork. However, in a large and ambiguous environment, learning such a\nregression task directly can be difficult for a single network. In this work,\nwe present a new hierarchical scene coordinate network to predict pixel scene\ncoordinates in a coarse-to-fine manner from a single RGB image. The network\nconsists of a series of output layers, each of them conditioned on the previous\nones. The final output layer predicts the 3D coordinates and the others produce\nprogressively finer discrete location labels. The proposed method outperforms\nthe baseline regression-only network and allows us to train compact models\nwhich scale robustly to large environments. It sets a new state-of-the-art for\nsingle-image RGB localization performance on the 7-Scenes, 12-Scenes, Cambridge\nLandmarks datasets, and three combined scenes. Moreover, for large-scale\noutdoor localization on the Aachen Day-Night dataset, we present a hybrid\napproach which outperforms existing scene coordinate regression methods, and\nreduces significantly the performance gap w.r.t. explicit feature matching\nmethods.\n

References

Page 1

	Year	Citations

Page 1