A Point Set Generation Network for 3D Object Reconstruction from a Single Image

TLDR

Deep neural networks are increasingly used to generate 3D data, yet most methods rely on regular representations such as voxel grids or image collections that obscure shape invariance and face other limitations, and the ground‑truth shape for a single image can be ambiguous. The paper aims to reconstruct 3D shapes from a single image by directly predicting point‑cloud coordinates, addressing output ambiguity through a novel architecture, loss, and learning paradigm. The authors propose a conditional shape sampler that, using a novel architecture, loss, and learning paradigm, predicts multiple plausible point‑cloud reconstructions from a single image. Experiments show the system outperforms state‑of‑the‑art single‑image 3D reconstruction methods, achieves strong shape‑completion results, and can generate multiple plausible predictions.

Abstract

Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -- point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

References

Page 1

	Year	Citations
Conditional Generative Adversarial Nets Mehdi Mirza, Simon Osindero arXiv (Cornell University) Artificial IntelligenceGenerative SystemGenerative Adversarial NetsMachine LearningData Science	2014	8.9K
The Earth Mover's Distance as a Metric for Image Retrieval Yossi Rubner, Carlo Tomasi, Leonidas J. Guibas International Journal of Computer Vision	2000	4.4K
ShapeNet: An Information-Rich 3D Model Repository Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, arXiv (Cornell University) Geometric LearningEngineeringMachine LearningComputer-aided DesignModel Repository	2015	2.4K
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network David Eigen, Christian Puhrsch, Rob Fergus arXiv (Cornell University)	2014	2.3K
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network David Eigen, Christian Puhrsch, Rob Fergus arXiv (Cornell University) EngineeringMachine LearningNyu DepthDepth MapImage Analysis	2014	1.8K
Make3D: Learning 3D Scene Structure from a Single Still Image Amit Saxena, Min Sun, A.Y. Ng IEEE Transactions on Pattern Analysis and Machine Intelligence EngineeringScene StructureMarkov Random Field3D Computer VisionImage Analysis	2009	1.7K
Visual simultaneous localization and mapping: a survey Jorge Fuentes-Pacheco, J. Ruiz-Ascencio, Juan Manuel Rendón-Mancha Artificial Intelligence Review CartographyMachine VisionEngineeringOdometryField Robotics	2012	876
The farthest point strategy for progressive image sampling Y. Eldar, Michael Lindenbaum, Moshe Porat, IEEE Transactions on Image Processing Image AnalysisMachine VisionSparse ImagingEngineeringFarthest Point Strategy	1997	637
Automatic photo pop-up Derek Hoiem, Alexei A. Efros, Martial Hebert ACM Transactions on Graphics EngineeringImage DatabasePhoto AlbumSingle Photograph3D Computer Vision	2005	622
Category-specific object reconstruction from a single image Abhishek Kar, Shubham Tulsiani, João Carreira, Engineering3D Computer VisionImage AnalysisPattern RecognitionCategory-specific Object Reconstruction	2015	266

Page 1