Table of Contents
Fetching ...

Constrained Diffusion with Trust Sampling

William Huang, Yifeng Jiang, Tom Van Wouwe, C. Karen Liu

TL;DR

This work forms a series of constrained optimizations throughout the inference process of a diffusion model, allowing the sample to take multiple steps along the gradient of the proxy constraint function until it can no longer trust the proxy, according to the variance at each diffusion level.

Abstract

Diffusion models have demonstrated significant promise in various generative tasks; however, they often struggle to satisfy challenging constraints. Our approach addresses this limitation by rethinking training-free loss-guided diffusion from an optimization perspective. We formulate a series of constrained optimizations throughout the inference process of a diffusion model. In each optimization, we allow the sample to take multiple steps along the gradient of the proxy constraint function until we can no longer trust the proxy, according to the variance at each diffusion level. Additionally, we estimate the state manifold of diffusion model to allow for early termination when the sample starts to wander away from the state manifold at each diffusion step. Trust sampling effectively balances between following the unconditional diffusion model and adhering to the loss guidance, enabling more flexible and accurate constrained generation. We demonstrate the efficacy of our method through extensive experiments on complex tasks, and in drastically different domains of images and 3D motion generation, showing significant improvements over existing methods in terms of generation quality. Our implementation is available at https://github.com/will-s-h/trust-sampling.

Constrained Diffusion with Trust Sampling

TL;DR

This work forms a series of constrained optimizations throughout the inference process of a diffusion model, allowing the sample to take multiple steps along the gradient of the proxy constraint function until it can no longer trust the proxy, according to the variance at each diffusion level.

Abstract

Diffusion models have demonstrated significant promise in various generative tasks; however, they often struggle to satisfy challenging constraints. Our approach addresses this limitation by rethinking training-free loss-guided diffusion from an optimization perspective. We formulate a series of constrained optimizations throughout the inference process of a diffusion model. In each optimization, we allow the sample to take multiple steps along the gradient of the proxy constraint function until we can no longer trust the proxy, according to the variance at each diffusion level. Additionally, we estimate the state manifold of diffusion model to allow for early termination when the sample starts to wander away from the state manifold at each diffusion step. Trust sampling effectively balances between following the unconditional diffusion model and adhering to the loss guidance, enabling more flexible and accurate constrained generation. We demonstrate the efficacy of our method through extensive experiments on complex tasks, and in drastically different domains of images and 3D motion generation, showing significant improvements over existing methods in terms of generation quality. Our implementation is available at https://github.com/will-s-h/trust-sampling.

Paper Structure

This paper contains 31 sections, 13 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Trust Sampling can be applied to complex constraint problems in drastically different domains.
  • Figure 2: Results on solving linear inverse problems. The left shows examples of box inpainting; the right shows examples of super-resolution.
  • Figure 3: Qualitative results for Trust on Gaussian Deblurring. The first two rows of images are from FFHQ, and the latter two rows of images are from ImageNet.
  • Figure 4: Qualitative results for Trust on Box Inpainting. The first two rows of images are from FFHQ, and the latter two rows of images are from ImageNet.
  • Figure 5: Qualitative results for Trust on Super-Resolution. The first two rows of images are from FFHQ, and the latter two rows of images are from ImageNet.
  • ...and 1 more figures