Publication | Closed Access
Multi-view 3D Reconstruction with Transformers
104
Citations
21
References
2021
Year
Convolutional Neural NetworkEngineeringMachine LearningFeature ExtractionComputer-aided DesignMulti-view Geometry3D Computer VisionImage AnalysisData ScienceComputational GeometryVideo TransformerGeometric ModelingView Feature ExtractionMachine Vision3D VideoComputer ScienceMulti-view 3DDeep Learning3D Object RecognitionComputer Vision3D VisionNatural Sciences3D ReconstructionDeep Cnn-based Methods
Deep CNN-based methods have so far achieved the state of the art results in multi-view 3D object reconstruction. Despite the considerable progress, the two core modules of these methods - view feature extraction and multi-view fusion, are usually investigated separately, and the relations among multiple input views are rarely explored. Inspired by the recent great success in Transformer models, we reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem and propose a framework named 3D Volume Transformer. Unlike previous CNN-based methods using a separate design, we unify the feature extraction and view fusion in a single Transformer network. A natural advantage of our design lies in the exploration of view-to-view relationships using self-attention among multiple unordered inputs. On ShapeNet - a large-scale 3D reconstruction benchmark, our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters (70% less) than CNN-based methods. Experimental results also suggest the strong scaling capability of our method. Our code will be made publicly available.
| Year | Citations | |
|---|---|---|
Page 1
Page 1