Table of Contents
Fetching ...

ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization

Onat Şahin, Mohammad Altillawi, George Eskandar, Carlos Carbone, Ziyuan Liu

TL;DR

ConsistentDreamer addresses the challenge of view-inconsistent image-to-3D generation by coupling fixed multi-view priors with SDS-guided unseen views to regularize a Gaussian-based 3D representation. The method balances rough base-shape optimization and fine-detail reconstruction through dynamic, uncertainty-based loss weights, while enforcing surface fidelity via opacity, depth distortion, and normal alignment losses. Empirical results on multiple benchmarks show improved view consistency and competitive perceptual quality compared to state-of-the-art, with robust performance across varying initial multi-view sources. This approach offers a practical pathway to high-fidelity, view-consistent 3D assets suitable for embodied AI simulations and beyond.

Abstract

Recent advances in diffusion models have significantly improved 3D generation, enabling the use of assets generated from an image for embodied AI simulations. However, the one-to-many nature of the image-to-3D problem limits their use due to inconsistent content and quality across views. Previous models optimize a 3D model by sampling views from a view-conditioned diffusion prior, but diffusion models cannot guarantee view consistency. Instead, we present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them with another diffusion model through a score distillation sampling (SDS) loss. Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape. In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction. To balance between the rough shape and the fine-detail optimizations, we introduce dynamic task-dependent weights based on homoscedastic uncertainty, updated automatically in each iteration. Additionally, we employ opacity, depth distortion, and normal alignment losses to refine the surface for mesh extraction. Our method ensures better view consistency and visual quality compared to the state-of-the-art.

ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization

TL;DR

ConsistentDreamer addresses the challenge of view-inconsistent image-to-3D generation by coupling fixed multi-view priors with SDS-guided unseen views to regularize a Gaussian-based 3D representation. The method balances rough base-shape optimization and fine-detail reconstruction through dynamic, uncertainty-based loss weights, while enforcing surface fidelity via opacity, depth distortion, and normal alignment losses. Empirical results on multiple benchmarks show improved view consistency and competitive perceptual quality compared to state-of-the-art, with robust performance across varying initial multi-view sources. This approach offers a practical pathway to high-fidelity, view-consistent 3D assets suitable for embodied AI simulations and beyond.

Abstract

Recent advances in diffusion models have significantly improved 3D generation, enabling the use of assets generated from an image for embodied AI simulations. However, the one-to-many nature of the image-to-3D problem limits their use due to inconsistent content and quality across views. Previous models optimize a 3D model by sampling views from a view-conditioned diffusion prior, but diffusion models cannot guarantee view consistency. Instead, we present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them with another diffusion model through a score distillation sampling (SDS) loss. Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape. In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction. To balance between the rough shape and the fine-detail optimizations, we introduce dynamic task-dependent weights based on homoscedastic uncertainty, updated automatically in each iteration. Additionally, we employ opacity, depth distortion, and normal alignment losses to refine the surface for mesh extraction. Our method ensures better view consistency and visual quality compared to the state-of-the-art.

Paper Structure

This paper contains 11 sections, 11 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: ConsistentDreamer for image-to-3D. Unlike prior optimization methods using only view-conditioned diffusion guidance or multi-view reconstruction, we utilize both for balanced optimization of rough shape and fine details, improving view consistency in content and visual quality.
  • Figure 2: ConsistentDreamer pipeline. ConsistentDreamer is a Gaussian-based method for view-consistent 3D generation from a single image, guided by consistent multi-view images generated in a prior stage. Rough shape is optimized by improving random views with a diffusion conditioned on the closest prior view, while fine details are refined by comparing all prior views to corresponding views of the representation. A balance between rough and fine optimizations is found with dynamic weights updated based on the final loss, with mesh extraction ensured through depth distortion, normal alignment, and opacity losses.
  • Figure 3: Qualitative comparisons on image-to-3D mesh generation. We compare ConsistentDreamer against various methods using images from the internet. We include the input image and two diagonal views of each generated mesh from 45$\degree$ and 225$\degree$ azimuth angles. Our method gives the best results with consistent detail and color across views all around the object.
  • Figure 4: Sample Meshes from our quantitative evaluation. We show some sample meshes from our quantitative evaluation shown on Tables \ref{['quantitive_table_gso']}, \ref{['quantitive_table_oo']} and \ref{['quantitive_table_obja']}. We include the two diagonal views of each generated mesh from 45$\degree$ and 225$\degree$ azimuth angles, along with corresponding ground truth views.
  • Figure 5: SSIM and PSNR results. To analyze the behavior of SSIM and PSNR we show back-views of generated meshes, along with the front-view input image and the ground truth back-view. The results show that these metrics may not reflect the similarity in detail level and clarity.
  • ...and 2 more figures