OctNetFusion: Learning Depth Fusion from Data

TLDR

Traditional depth fusion relies on averaging truncated signed distance functions, which, while simple, struggles with occluded surfaces and requires many frames to suppress noise and outliers. The study aims to develop a learning‑based depth fusion method that uses a 3D CNN to predict implicit surface representations from multiple depth images. The authors train a 3D convolutional neural network to infer an implicit surface from depth maps, enabling depth fusion without explicit TSDF averaging. The learned fusion method outperforms conventional TSDF and TV‑L1 fusion, reducing noise and outliers, accurately reconstructing occluded areas, and achieving state‑of‑the‑art 3D shape completion.

Abstract

In this paper, we present a learning based approach to depth fusion, i.e., dense 3D reconstruction from multiple depth images. The most common approach to depth fusion is based on averaging truncated signed distance functions, which was originally proposed by Curless and Levoy in 1996. While this method is simple and provides great results, it is not able to reconstruct (partially) occluded surfaces and requires a large number frames to filter out sensor noise and outliers. Motivated by the availability of large 3D model repositories and recent advances in deep learning, we present a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps. Our learning based method significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression. By learning the structure of real world 3D objects and scenes, our approach is further able to reconstruct occluded regions and to fill in gaps in the reconstruction. We demonstrate that our learning based approach outperforms both vanilla TSDF fusion as well as TV-L1 fusion on the task of volumetric fusion. Further, we demonstrate state-of-the-art 3D shape completion results.

References

Page 1

	Year	Citations

Page 1