Concepedia

TLDR

Reconstructing dense 3D models from multi‑view images is challenging; CNN‑based learning ignores perspective geometry and occlusion, while classical MRFs with ray potentials capture physics but fail with large appearance variations. This work proposes RayNet, a hybrid framework that merges CNN feature learning with MRF ray potentials to address these limitations. RayNet couples a CNN that learns view‑invariant features with an MRF that encodes perspective projection and occlusion, and is trained end‑to‑end by empirical risk minimization. Experiments on challenging real‑world datasets show RayNet outperforms piece‑wise trained baselines, hand‑crafted models, and other learning‑based methods.

Abstract

In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.

References

YearCitations

Page 1