Publication | Closed Access
Learning to Recover 3D Scene Shape from a Single Image
190
Citations
39
References
2021
Year
Unknown Venue
EngineeringMachine LearningDepth MapDepth Shift3D Computer VisionImage AnalysisData SciencePattern RecognitionUnknown Depth ShiftMonocular Depth EstimationComputational ImagingComputational GeometryGeometric ModelingMachine VisionDeep LearningComputer Vision3D VisionNatural SciencesDense ReconstructionScene ShapeScene Understanding3D ReconstructionMulti-view GeometryScene Modeling
Despite significant progress in monocular depth estimation, current state‑of‑the‑art methods cannot recover accurate 3D scene shape because of an unknown depth shift from shift‑invariant reconstruction losses and uncertain camera focal length. The study investigates this depth‑shift problem and proposes a two‑stage framework that first estimates depth up to an unknown scale and shift from a single image, then uses 3D point‑cloud encoders to infer the missing depth shift and focal length, while also introducing image‑level normalized regression and normal‑based geometry losses to improve mixed‑dataset depth models. The framework first predicts depth up to an unknown scale and shift, then employs 3D point‑cloud encoders to recover the missing depth shift and camera focal length, and incorporates image‑level normalized regression and normal‑based geometry losses to enhance mixed‑dataset depth prediction. The model achieves state‑of‑the‑art zero‑shot generalization across nine unseen datasets. Code is available at https://git.io/Depth.
Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth
| Year | Citations | |
|---|---|---|
Page 1
Page 1