Table of Contents
Fetching ...

Vista3D: Unravel the 3D Darkside of a Single Image

Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang

TL;DR

Vista3D is presented, a framework that realizes swift and consistent 3D generation within a mere 5 minutes and elevates the quality of generation by using a disentangled representation with two independent implicit functions to capture both visible and obscured aspects of objects.

Abstract

We embark on the age-old quest: unveiling the hidden dimensions of objects from mere glimpses of their visible parts. To address this, we present Vista3D, a framework that realizes swift and consistent 3D generation within a mere 5 minutes. At the heart of Vista3D lies a two-phase approach: the coarse phase and the fine phase. In the coarse phase, we rapidly generate initial geometry with Gaussian Splatting from a single image. In the fine phase, we extract a Signed Distance Function (SDF) directly from learned Gaussian Splatting, optimizing it with a differentiable isosurface representation. Furthermore, it elevates the quality of generation by using a disentangled representation with two independent implicit functions to capture both visible and obscured aspects of objects. Additionally, it harmonizes gradients from 2D diffusion prior with 3D-aware diffusion priors by angular diffusion prior composition. Through extensive evaluation, we demonstrate that Vista3D effectively sustains a balance between the consistency and diversity of the generated 3D objects. Demos and code will be available at https://github.com/florinshen/Vista3D.

Vista3D: Unravel the 3D Darkside of a Single Image

TL;DR

Vista3D is presented, a framework that realizes swift and consistent 3D generation within a mere 5 minutes and elevates the quality of generation by using a disentangled representation with two independent implicit functions to capture both visible and obscured aspects of objects.

Abstract

We embark on the age-old quest: unveiling the hidden dimensions of objects from mere glimpses of their visible parts. To address this, we present Vista3D, a framework that realizes swift and consistent 3D generation within a mere 5 minutes. At the heart of Vista3D lies a two-phase approach: the coarse phase and the fine phase. In the coarse phase, we rapidly generate initial geometry with Gaussian Splatting from a single image. In the fine phase, we extract a Signed Distance Function (SDF) directly from learned Gaussian Splatting, optimizing it with a differentiable isosurface representation. Furthermore, it elevates the quality of generation by using a disentangled representation with two independent implicit functions to capture both visible and obscured aspects of objects. Additionally, it harmonizes gradients from 2D diffusion prior with 3D-aware diffusion priors by angular diffusion prior composition. Through extensive evaluation, we demonstrate that Vista3D effectively sustains a balance between the consistency and diversity of the generated 3D objects. Demos and code will be available at https://github.com/florinshen/Vista3D.
Paper Structure (21 sections, 6 equations, 9 figures, 3 tables)

This paper contains 21 sections, 6 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: 3D Darkside of Single Image. By employing various text prompts, Vista3D is capable of unveiling the diversity of unseen views while retaining 3D consistency and detail. Two novel views and the normal map are visualized for each text prompt.
  • Figure 2: Overview of Vista3D. We generate high-fidelity mesh from single image input in a coarse-to-fine manner. In the coarse stage, we utilize Gaussian Splatting to learn a coarse geometry with a 3D-aware 2D diffusion prior. We further extract sign distance fields from Gaussian Splatting for refinement. Another 2D diffusion prior is enabled with an angular-based composition to explore diverse darkside while retain 3D consistency in refinement stage.
  • Figure 3: Qualitative Comparison on image-to-3D generation. We compare our Vista3D-S with DreamGaussian dreamgaussian, and Magic123 magic123. Vista3D-S only takes 5 minutes to reconstruct single 3D object, yielding competitive geometry and more consistent textures compared to Magic123 magic123 with $20 \times$ speedup.
  • Figure 4: Qualitative Comparison with One-2-3-45 one2345 and Wonder3D wonder3D. In this comparison, we render two views of each 3D object as generated by One-2-3-45 and Wonder3D. For Vista3D-L, we detail the text prompts utilized for the generation of each 3D object, showcasing three rendered views alongside a single normal map for a comprehensive comparison.
  • Figure 5: Ablation study of overall framework and disentangled texture.
  • ...and 4 more figures