Concepedia

Abstract

Although cluttered indoor scenes have a lot of useful high-level semantic information which can be used for mapping and localization, most visual odometry (VO) algorithms rely on the usage of geometric features such as points, lines, and planes. Lately, driven by this idea, the joint optimization of semantic labels and estimating odometry has gained popularity in the robotics community. This joint optimization method is accurate but is generally very slow. At the same time, in the vision community, direct and sparse approaches for VO have stricken the right balance between speed and accuracy. We merge the successes of these two communities and present a preprocessing method to incorporate semantic information in the form of visual saliency to direct sparse odometry (DSO)-a highly successful direct sparse VO algorithm. We also present a framework to filter the visual saliency based on scene parsing. Our framework SalientDSO relies on the widely successful deep learning-based approaches for visual saliency and scene parsing, which drives the feature selection for obtaining highly accurate and robust VO even in the presence of as few as 40 point features per frame. We provide an extensive quantitative evaluation of SalientDSO on the ICL-NUIM and the TUM monoVO data sets and show that we outperform DSO and ORB-simultaneous localization and mapping-two very popular state-of-the-art approaches in the literature. We also collect and publicly release a CVL-UMD data set which contains two indoor cluttered sequences on which we show qualitative evaluations. To the best of our knowledge, this is the first paper to use visual saliency and scene parsing to drive the feature selection in direct VO.

References

YearCitations

Page 1