ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching
Yumin Zhang, Xingyu Miao, Haoran Duan, Bo Wei, Tejal Shah, Yang Long, Rajiv Ranjan
TL;DR
This work tackles the fidelity gap in diffusion-based text-to-3D generation by addressing DDIM inversion bias. It introduces Exact Score Matching (ESM), which uses auxiliary variables and a LoRA-driven recovery path to achieve exact recovery in the DDIM reverse process, mitigating the accumulation of errors that cause over-smoothing and content loss. Empirical results on Gaussian Splatting-based 3D representations demonstrate improved detail and prompt alignment over strong baselines, with careful analysis of hyperparameters and initialization effects. While showing practical gains on high-fidelity 3D content, the approach acknowledges potential instability and sensitivity to settings that warrant further refinement.
Abstract
Text-to-3D content creation is a rapidly evolving research area. Given the scarcity of 3D data, current approaches often adapt pre-trained 2D diffusion models for 3D synthesis. Among these approaches, Score Distillation Sampling (SDS) has been widely adopted. However, the issue of over-smoothing poses a significant limitation on the high-fidelity generation of 3D models. To address this challenge, LucidDreamer replaces the Denoising Diffusion Probabilistic Model (DDPM) in SDS with the Denoising Diffusion Implicit Model (DDIM) to construct Interval Score Matching (ISM). However, ISM inevitably inherits inconsistencies from DDIM, causing reconstruction errors during the DDIM inversion process. This results in poor performance in the detailed generation of 3D objects and loss of content. To alleviate these problems, we propose a novel method named Exact Score Matching (ESM). Specifically, ESM leverages auxiliary variables to mathematically guarantee exact recovery in the DDIM reverse process. Furthermore, to effectively capture the dynamic changes of the original and auxiliary variables, the LoRA of a pre-trained diffusion model implements these exact paths. Extensive experiments demonstrate the effectiveness of ESM in text-to-3D generation, particularly highlighting its superiority in detailed generation.
