3-D Depth Reconstruction from a Single Still Image

TLDR

Depth estimation is challenging because local features alone are insufficient and global context must be considered. The study aims to estimate 3‑D depth from a single still image and proposes a model that fuses monocular and stereo cues for higher accuracy. Using supervised learning, the authors train a hierarchical, multiscale Markov Random Field on paired monocular images and ground‑truth depthmaps, incorporating local and global image features and modeling depth relationships. The method reliably recovers accurate depthmaps on unstructured indoor and outdoor scenes, and the combined monocular‑stereo model yields significantly better accuracy than either cue alone.

Abstract

We consider the task of 3-d depth estimation from a single still image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured indoor and outdoor environments which include forests, sidewalks, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the value of the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a hierarchical, multiscale Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models the depths and the relation between depths at different points in the image. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. We further propose a model that incorporates both monocular cues and stereo (triangulation) cues, to obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone.

References

Page 1

	Year	Citations

Page 1