Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

TLDR

Creating high‑fidelity 3D content from a single image is inherently challenging because it requires estimating underlying geometry while hallucinating unseen textures. This work investigates generating high‑fidelity 3D models from a single image by using a well‑trained 2D diffusion model as 3D‑aware supervision. Make‑It‑3D uses a two‑stage pipeline that first optimizes a neural radiance field with constraints from the reference image and diffusion prior, and then refines the coarse model into textured point clouds, further enhancing realism with the diffusion prior and high‑quality textures. Experiments show that Make‑It‑3D outperforms prior methods, producing faithful reconstructions and impressive visual quality, and marks the first successful high‑quality 3‑D creation from a single image for general objects, enabling applications such as text‑to‑3D generation and texture editing.

Abstract

In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.

References

Page 1

	Year	Citations

Page 1