Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs

TLDR

Predicting the depth (or surface normal) of a scene from single monocular color images is a challenging task. This paper tackles the challenging, underdetermined problem of monocular depth and surface normal estimation by regressing on deep convolutional neural network features and refining the results with conditional random fields. The framework first learns a mapping from multi‑scale image patches to super‑pixel depth or normal values using a DCNN, then refines these estimates to pixel level with a CRF that incorporates data, smoothness, and auto‑regression terms. The inference admits a closed‑form solution and experiments on Make3D and NYU Depth V2 demonstrate competitive performance versus recent state‑of‑the‑art methods.

Abstract

Predicting the depth (or surface normal) of a scene from single monocular color images is a challenging task. This paper tackles this challenging and essentially underdetermined problem by regression on deep convolutional neural network (DCNN) features, combined with a post-processing refining step using conditional random fields (CRF). Our framework works at two levels, super-pixel level and pixel level. First, we design a DCNN model to learn the mapping from multi-scale image patches to depth or surface normal values at the super-pixel level. Second, the estimated super-pixel depth or surface normal is refined to the pixel level by exploiting various potentials on the depth or surface normal map, which includes a data term, a smoothness term among super-pixels and an auto-regression term characterizing the local structure of the estimation map. The inference problem can be efficiently solved because it admits a closed-form solution. Experiments on the Make3D and NYU Depth V2 datasets show competitive results compared with recent state-of-the-art methods.

References

Page 1

	Year	Citations

Page 1