A Quantitative Evaluation of Score Distillation Sampling Based Text-to-3D
Xiaohan Fei, Chethan Parameshwara, Jiawei Mo, Xiaolong Li, Ashwin Swaminathan, CJ Taylor, Paolo Favaro, Stefano Soatto
TL;DR
This work targets the lack of quantitative metrics for SDS-based text-to-3D by introducing an objective evaluation protocol that measures the Janus problem, text-3D alignment, and realism, validated against human judgments. It formalizes the SDS objective and analyzes its core challenges, including viewpoint conditioning and nuisance variability, and demonstrates how a multiview diffusion framework can mitigate some failures. The authors propose a two-stage baseline combining Multiview Diffusion and Gaussian Splatting, with a refinement stage that fuses SDS signals from MVDream and Stable Diffusion to improve fidelity, while carefully managing the Janus trade-off. Empirically, the protocol reveals strengths and limitations of current methods, and the full approach achieves competitive alignment and realism with favorable efficiency, establishing a strong, reusable baseline for future text-to-3D research.
Abstract
The development of generative models that create 3D content from a text prompt has made considerable strides thanks to the use of the score distillation sampling (SDS) method on pre-trained diffusion models for image generation. However, the SDS method is also the source of several artifacts, such as the Janus problem, the misalignment between the text prompt and the generated 3D model, and 3D model inaccuracies. While existing methods heavily rely on the qualitative assessment of these artifacts through visual inspection of a limited set of samples, in this work we propose more objective quantitative evaluation metrics, which we cross-validate via human ratings, and show analysis of the failure cases of the SDS technique. We demonstrate the effectiveness of this analysis by designing a novel computationally efficient baseline model that achieves state-of-the-art performance on the proposed metrics while addressing all the above-mentioned artifacts.
