Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation

TLDR

Shape generation in 3D mesh representation from a few color images with known camera poses is studied. The study aims to improve shape quality by leveraging cross‑view information with a graph convolutional network. The model iteratively refines a coarse mesh by predicting deformation series, sampling nearby vertex neighborhoods, and optimizing deformations using perceptual feature statistics from multiple images, inspired by traditional multi‑view geometry. Experiments demonstrate that the model generates accurate, visually plausible 3D shapes aligned to arbitrary viewpoints and generalizes across semantic categories, input image counts, and mesh initialization quality.

Abstract

We study the problem of shape generation in 3D mesh representation from a few color images with known camera poses. While many previous works learn to hallucinate the shape directly from priors, we resort to further improving the shape quality by leveraging cross-view information with a graph convolutional network. Instead of building a direct mapping function from images to 3D shape, our model learns to predict series of deformations to improve a coarse shape iteratively. Inspired by traditional multiple view geometry methods, our network samples nearby area around the initial mesh's vertex locations and reasons an optimal deformation using perceptual feature statistics built from multiple input images. Extensive experiments show that our model produces accurate 3D shape that are not only visually plausible from the input perspectives, but also well aligned to arbitrary viewpoints. With the help of physically driven architecture, our model also exhibits generalization capability across different semantic categories, number of input images, and quality of mesh initialization.

References

Page 1

	Year	Citations

Page 1