Table of Contents
Fetching ...

Adversarial Score Distillation: When score distillation meets GAN

Min Wei, Jingkai Zhou, Junyao Sun, Xuesong Zhang

TL;DR

The Adversarial Score Distillation (ASD) is proposed, which maintains an optimizable discriminator and updates it using the complete optimization objective and performs favorably in 2D distillation and text-to-3D tasks against existing methods.

Abstract

Existing score distillation methods are sensitive to classifier-free guidance (CFG) scale: manifested as over-smoothness or instability at small CFG scales, while over-saturation at large ones. To explain and analyze these issues, we revisit the derivation of Score Distillation Sampling (SDS) and decipher existing score distillation with the Wasserstein Generative Adversarial Network (WGAN) paradigm. With the WGAN paradigm, we find that existing score distillation either employs a fixed sub-optimal discriminator or conducts incomplete discriminator optimization, resulting in the scale-sensitive issue. We propose the Adversarial Score Distillation (ASD), which maintains an optimizable discriminator and updates it using the complete optimization objective. Experiments show that the proposed ASD performs favorably in 2D distillation and text-to-3D tasks against existing methods. Furthermore, to explore the generalization ability of our WGAN paradigm, we extend ASD to the image editing task, which achieves competitive results. The project page and code are at https://github.com/2y7c3/ASD.

Adversarial Score Distillation: When score distillation meets GAN

TL;DR

The Adversarial Score Distillation (ASD) is proposed, which maintains an optimizable discriminator and updates it using the complete optimization objective and performs favorably in 2D distillation and text-to-3D tasks against existing methods.

Abstract

Existing score distillation methods are sensitive to classifier-free guidance (CFG) scale: manifested as over-smoothness or instability at small CFG scales, while over-saturation at large ones. To explain and analyze these issues, we revisit the derivation of Score Distillation Sampling (SDS) and decipher existing score distillation with the Wasserstein Generative Adversarial Network (WGAN) paradigm. With the WGAN paradigm, we find that existing score distillation either employs a fixed sub-optimal discriminator or conducts incomplete discriminator optimization, resulting in the scale-sensitive issue. We propose the Adversarial Score Distillation (ASD), which maintains an optimizable discriminator and updates it using the complete optimization objective. Experiments show that the proposed ASD performs favorably in 2D distillation and text-to-3D tasks against existing methods. Furthermore, to explore the generalization ability of our WGAN paradigm, we extend ASD to the image editing task, which achieves competitive results. The project page and code are at https://github.com/2y7c3/ASD.
Paper Structure (14 sections, 17 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 17 equations, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: 2D score distillation examples with the prompts "a photograph of an astronaut riding a horse" and "exterior frontal perspective shot of resort villa inspired by Mykonos architecture". SDS is very sensitive to the CFG scales while VSD exhibits fluctuation of generated contents during distillation at small CFG scales.
  • Figure 2: Examples generated by ASD in 2D distillation, image editing, and text-to-3D tasks. Stable Diffusion sd_2_1_basesd_2_1 is used as the pretrained diffusion model. For more results please refer to our project page.
  • Figure 3: Workflow of ASD. Green lines show the pipeline of generator optimization. Orange lines show the pipeline of discriminator optimization. The avatar of NeRF is adapted from nerf. See the supplementary for the algorithm description.
  • Figure 4: 2D score distillation results with the prompts "hamburger" and "a monster truck". VSD$^\dagger$ denotes that the LoRA branch is updated with 50 steps per iteration, resulting in over-smoothing images similar to SDS.
  • Figure 5: Score distillation results with the prompt "a photograph of a fox", "a front view of an owl" in 2D, and "a colorful rooster", "a plush dragon toy" in 3D. Only using the classifier term is equivalent to using SDS with a huge CFG scale, which tends to get over-saturated results in both 2D and 3D distillation. ASD can recover from over-saturated initialization.
  • ...and 8 more figures