Table of Contents
Fetching ...

Target-Balanced Score Distillation

Zhou Xu, Qi Wang, Yuxiao Yang, Luyuan Zhang, Zhang Liang, Yang Li

TL;DR

Target-Balanced Score Distillation (TBSD) tackles the texture–geometry trade-off in SDS-based 3D generation by analyzing how Target Negative Prompts (TNP) influence texture realism and shape preservation. TBSD frames generation as a multi-objective optimization, combining a shape-guidance term with a texture-enhancement term and using a MGDA-inspired, time-varying weighting to shift focus from geometry to texture as training progresses. The approach includes injecting target information via classifier-free guidance and a dynamic coefficient to stabilize geometry while enabling richer textures. Extensive 2D and 3D experiments show TBSD outperforms prior SDS variants and baselines, delivering high-fidelity textures together with geometrically accurate shapes, as validated by CLIP scores and user studies. This work provides a practical, adaptable framework for high-quality 3D asset generation using diffusion priors.

Abstract

Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-smoothing. To mitigate this issue, recent variants have incorporated negative prompts. However, these methods face a critical trade-off: limited texture optimization, or significant texture gains with shape distortion. In this work, we first conduct a systematic analysis and reveal that this trade-off is fundamentally governed by the utilization of the negative prompts, where Target Negative Prompts (TNP) that embed target information in the negative prompts dramatically enhancing texture realism and fidelity but inducing shape distortions. Informed by this key insight, we introduce the Target-Balanced Score Distillation (TBSD). It formulates generation as a multi-objective optimization problem and introduces an adaptive strategy that effectively resolves the aforementioned trade-off. Extensive experiments demonstrate that TBSD significantly outperforms existing state-of-the-art methods, yielding 3D assets with high-fidelity textures and geometrically accurate shape.

Target-Balanced Score Distillation

TL;DR

Target-Balanced Score Distillation (TBSD) tackles the texture–geometry trade-off in SDS-based 3D generation by analyzing how Target Negative Prompts (TNP) influence texture realism and shape preservation. TBSD frames generation as a multi-objective optimization, combining a shape-guidance term with a texture-enhancement term and using a MGDA-inspired, time-varying weighting to shift focus from geometry to texture as training progresses. The approach includes injecting target information via classifier-free guidance and a dynamic coefficient to stabilize geometry while enabling richer textures. Extensive 2D and 3D experiments show TBSD outperforms prior SDS variants and baselines, delivering high-fidelity textures together with geometrically accurate shapes, as validated by CLIP scores and user studies. This work provides a practical, adaptable framework for high-quality 3D asset generation using diffusion priors.

Abstract

Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-smoothing. To mitigate this issue, recent variants have incorporated negative prompts. However, these methods face a critical trade-off: limited texture optimization, or significant texture gains with shape distortion. In this work, we first conduct a systematic analysis and reveal that this trade-off is fundamentally governed by the utilization of the negative prompts, where Target Negative Prompts (TNP) that embed target information in the negative prompts dramatically enhancing texture realism and fidelity but inducing shape distortions. Informed by this key insight, we introduce the Target-Balanced Score Distillation (TBSD). It formulates generation as a multi-objective optimization problem and introduces an adaptive strategy that effectively resolves the aforementioned trade-off. Extensive experiments demonstrate that TBSD significantly outperforms existing state-of-the-art methods, yielding 3D assets with high-fidelity textures and geometrically accurate shape.

Paper Structure

This paper contains 39 sections, 20 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Results obtained with our Target-Balanced Score Distillation (TBSD). Top: a gallery of images optimized with TBSD. Bottom: two rows NeRFs generated by TBSD (other examples are included in the supplementary material).
  • Figure 1: Comparison results of Bridge with TNP and GNP. For images corresponding to the two sets of prompts, the leftmost column shows results generated by SDS in the first stage of Bridge, the right two columns show results optimized over time, and the prompts used are "A worn-out leather briefcase" and "A tumbaga pendant depicting a cat" from left to right.
  • Figure 2: Visualization of $\delta_{\mathrm{post}}^{\mathrm{gnp}}, \ \delta_{\mathrm{post}}^{\mathrm{tnp}}, \ \delta^{\mathrm{cls}}$ and $\delta_{\mathrm{Bridge}}$. Top-row images are generated by TBSD. Visualization is done by decoding each $\delta$ with the VAE decoder of Stable Diffusion.
  • Figure 2: Generation examples of TBSD (ours) with different seed values. Prompts are as follows: “A cauldron full of gold coins” (top row), “A plush dragon toy” (middle row), and “Pumpkin head zombie, skinny, highly detailed, photorealistic” (bottom row).
  • Figure 3: (a) Result images with varying levels of shape information controlled by coefficient $a$. (b) An overview of the proposed TBSD. Solid lines indicate reachable optimization paths, while dashed lines indicate unreachable ones. The prompts used for Figs. (a), (b) are "A cauldron full of gold coins", "A pineapple", respectively.
  • ...and 8 more figures