Table of Contents
Fetching ...

One step further with Monte-Carlo sampler to guide diffusion better

Minsi Ren, Wenhao Deng, Ruiqi Feng, Tailin Wu

TL;DR

Experimental results demonstrate that the proposed plug-and-play adjustment strategy can be effec- tively used with higher order samplers and consistently improves the quality of generation samples across all the different scenarios.

Abstract

Stochastic differential equation (SDE)-based generative models have achieved substantial progress in conditional generation via training-free differentiable loss-guided approaches. However, existing methodologies utilizing posterior sam- pling typically confront a substantial estimation error, which results in inaccu- rate gradients for guidance and leading to inconsistent generation results. To mitigate this issue, we propose that performing an additional backward denois- ing step and Monte-Carlo sampling (ABMS) can achieve better guided diffu- sion, which is a plug-and-play adjustment strategy. To verify the effectiveness of our method, we provide theoretical analysis and propose the adoption of a dual-focus evaluation framework, which further serves to highlight the critical problem of cross-condition interference prevalent in existing approaches. We conduct experiments across various task settings and data types, mainly includ- ing conditional online handwritten trajectory generation, image inverse problems (inpainting, super resolution and gaussian deblurring) molecular inverse design and so on. Experimental results demonstrate that our approach can be effec- tively used with higher order samplers and consistently improves the quality of generation samples across all the different scenarios.

One step further with Monte-Carlo sampler to guide diffusion better

TL;DR

Experimental results demonstrate that the proposed plug-and-play adjustment strategy can be effec- tively used with higher order samplers and consistently improves the quality of generation samples across all the different scenarios.

Abstract

Stochastic differential equation (SDE)-based generative models have achieved substantial progress in conditional generation via training-free differentiable loss-guided approaches. However, existing methodologies utilizing posterior sam- pling typically confront a substantial estimation error, which results in inaccu- rate gradients for guidance and leading to inconsistent generation results. To mitigate this issue, we propose that performing an additional backward denois- ing step and Monte-Carlo sampling (ABMS) can achieve better guided diffu- sion, which is a plug-and-play adjustment strategy. To verify the effectiveness of our method, we provide theoretical analysis and propose the adoption of a dual-focus evaluation framework, which further serves to highlight the critical problem of cross-condition interference prevalent in existing approaches. We conduct experiments across various task settings and data types, mainly includ- ing conditional online handwritten trajectory generation, image inverse problems (inpainting, super resolution and gaussian deblurring) molecular inverse design and so on. Experimental results demonstrate that our approach can be effec- tively used with higher order samplers and consistently improves the quality of generation samples across all the different scenarios.
Paper Structure (31 sections, 1 theorem, 31 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 1 theorem, 31 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Let $f: \mathbb{R}^d \to \mathbb{R}$ be a continuously differentiable function with $L$-Lipschitz gradient, i.e., Then for any random variable $X$ with finite second moment, the Jensen gap satisfies:

Figures (7)

  • Figure 1: The visualization comparison results, where different colors represent different strokes. The guidance scale is set as 0.1. It can be observed that, even without manifold deviation, the fonts generated by DSG tend to have connected strokes, regardless of the target writing style. On the other hand, our method is able to better preserve the style characteristics.
  • Figure 2: Performance curves of Distance Metric vs FID Metric. We select different guidance scales for each method to obtain performance trend curves. It can be clearly observed that our method achieves a better guidance effect and exhibits greater robustness to the selection of guidance scale.
  • Figure 3: Qualitative result of Text-style guidance, the text input is "A corgi wearing a wizard hat". Our method generates much clearer and higher quality results than the baseline method across all the target style.
  • Figure 4: Qualitative result of content guidance. Compared to baseline model, the guidance we applied corrects the detailed strokes of the generated characters, making the character structures more accurate. The parts with structural errors are circled in a black box.
  • Figure 5: Performance curves of Distance Metric vs FID Metric on Gaussian Deblurring task.
  • ...and 2 more figures

Theorems & Definitions (3)

  • proof
  • Lemma 1: Jensen Gap Upper Bound
  • proof