ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
Zhenglin Zhou, Fan Ma, Xiaobo Xia, Hehe Fan, Yi Yang, Tat-Seng Chua
TL;DR
<3-5 sentence high-level summary> ITS3D introduces an inference-time scaling framework for text-guided 3D diffusion models by optimizing the initial Gaussian noise input through a verifier-guided search. It stabilizes and accelerates search with Gaussian normalization, compresses the high-dimensional search space via SVD, and sustains exploration with a singular-space reset. Across GPTEval3D, ITS3D achieves state-of-the-art gains on human-preference, image-text alignment, and comprehensive 3D quality metrics without additional training. The approach demonstrates the practical value of structured, search-based inference-time optimization for 3D generation and points toward semantic-aware verifiers as a promising future direction.
Abstract
We explore inference-time scaling in text-guided 3D diffusion models to enhance generative quality without additional training. To this end, we introduce ITS3D, a framework that formulates the task as an optimization problem to identify the most effective Gaussian noise input. The framework is driven by a verifier-guided search algorithm, where the search algorithm iteratively refines noise candidates based on verifier feedback. To address the inherent challenges of 3D generation, we introduce three techniques for improved stability, efficiency, and exploration capability. 1) Gaussian normalization is applied to stabilize the search process. It corrects distribution shifts when noise candidates deviate from a standard Gaussian distribution during iterative updates. 2) The high-dimensional nature of the 3D search space increases computational complexity. To mitigate this, a singular value decomposition-based compression technique is employed to reduce dimensionality while preserving effective search directions. 3) To further prevent convergence to suboptimal local minima, a singular space reset mechanism dynamically updates the search space based on diversity measures. Extensive experiments demonstrate that ITS3D enhances text-to-3D generation quality, which shows the potential of computationally efficient search methods in generative processes. The source code is available at https://github.com/ZhenglinZhou/ITS3D.
