FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow
Hangyu Li, Xiangxiang Chu, Dingyuan Shi, Wang Lin
TL;DR
This paper addresses the over-smoothing and color-saturation issues in SDS-based text-to-3D generation by replacing the diffusion prior with a pretrained rectified flow model. It first formulates Vector Field Distillation Sampling (VFDS) to adapt SDS to rectified flow, then identifies the root causes of residual smoothing via ODE trajectory analysis. Building on this, FlowDreamer introduces a Unique Couple Matching (UCM) loss that uses a push-backward noise search grounded in the rectified-flow reversibility and coupling to constrain learning along a single trajectory. Empirically, FlowDreamer achieves higher fidelity and richer textual details with faster convergence in both NeRF and 3D Gaussian Splatting, outperforming prior SDS- and diffusion-based methods, and reveals open questions around initialization for NeRF and noise-search strategies. This approach offers a practical, faster, and higher-quality alternative to diffusion priors for text-to-3D generation with broad applicability to multiple 3D representations.
Abstract
Recent advances in text-to-3D generation have made significant progress. In particular, with the pretrained diffusion models, existing methods predominantly use Score Distillation Sampling (SDS) to train 3D models such as Neural RaRecent advances in text-to-3D generation have made significant progress. In particular, with the pretrained diffusion models, existing methods predominantly use Score Distillation Sampling (SDS) to train 3D models such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3D GS). However, a hurdle is that they often encounter difficulties with over-smoothing textures and over-saturating colors. The rectified flow model -- which utilizes a simple ordinary differential equation (ODE) to represent a straight trajectory -- shows promise as an alternative prior to text-to-3D generation. It learns a time-independent vector field, thereby reducing the ambiguity in 3D model update gradients that are calculated using time-dependent scores in the SDS framework. In light of this, we first develop a mathematical analysis to seamlessly integrate SDS with rectified flow model, paving the way for our initial framework known as Vector Field Distillation Sampling (VFDS). However, empirical findings indicate that VFDS still results in over-smoothing outcomes. Therefore, we analyze the grounding reasons for such a failure from the perspective of ODE trajectories. On top, we propose a novel framework, named FlowDreamer, which yields high fidelity results with richer textual details and faster convergence. The key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the corresponding noise, rather than using randomly sampled noise as in VFDS. Accordingly, we introduce a novel Unique Couple Matching (UCM) loss, which guides the 3D model to optimize along the same trajectory.
