Table of Contents
Fetching ...

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, Jiaying Liu

TL;DR

This paper tackles instability in diffusion-based image super-resolution caused by randomness in the reverse diffusion process. It analyzes diffusion-ODE sampling and derives an approximately optimal boundary condition tilde{\mathbf{x}}_T that yields deterministic, high-quality samples via x_0 = h_\theta( tilde{\mathbf{x}}_T, \mathbf{y} ), with tilde{\mathbf{x}}_T largely independent of the low-resolution input. The authors approximate tilde{\mathbf{x}}_T by minimizing a LPIPS-based objective over a small HR-LR reference set \mathcal{R}, using Monte Carlo sampling to select a near-optimal boundary, and then apply this boundary to new LR images in a plug-and-play fashion. Empirically, few-step diffusion-ODE sampling with tilde{\mathbf{x}}_T outperforms existing sampling methods on both bicubic-SR and real-SR benchmarks without any additional training, significantly boosting pre-trained diffusion SR models. This approach offers a practical, model-agnostic means to stabilize and improve SR quality with reduced computation, and it can extend to other low-level vision tasks.

Abstract

Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training.

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

TL;DR

This paper tackles instability in diffusion-based image super-resolution caused by randomness in the reverse diffusion process. It analyzes diffusion-ODE sampling and derives an approximately optimal boundary condition tilde{\mathbf{x}}_T that yields deterministic, high-quality samples via x_0 = h_\theta( tilde{\mathbf{x}}_T, \mathbf{y} ), with tilde{\mathbf{x}}_T largely independent of the low-resolution input. The authors approximate tilde{\mathbf{x}}_T by minimizing a LPIPS-based objective over a small HR-LR reference set \mathcal{R}, using Monte Carlo sampling to select a near-optimal boundary, and then apply this boundary to new LR images in a plug-and-play fashion. Empirically, few-step diffusion-ODE sampling with tilde{\mathbf{x}}_T outperforms existing sampling methods on both bicubic-SR and real-SR benchmarks without any additional training, significantly boosting pre-trained diffusion SR models. This approach offers a practical, model-agnostic means to stabilize and improve SR quality with reduced computation, and it can extend to other low-level vision tasks.

Abstract

Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training.
Paper Structure (21 sections, 27 equations, 7 figures, 6 tables)

This paper contains 21 sections, 27 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Given a well-trained diffusion-based SR model, by solving diffusion ODEs, we can sample reasonable SR results with different BCs $\mathbf{x}_T$ as the figure shows. However, there is instability in the performances of each BC $\mathbf{x}_T$. We manage to find an approximately optimal BC $\tilde{\mathbf{x}}_T$ which can be projected to the sample $\tilde{\mathbf{x}}_0$ with nearly the highest probability density by the solution $h_\theta(\tilde{\mathbf{x}}_T, \mathbf{y})$ to diffusion ODE. Based on our analysis in the Sec. \ref{['subsec: analyzing']}, $\tilde{\mathbf{x}}_T$ is shared by different LR images $\mathbf{y}_i$. The method of finding $\tilde{\mathbf{x}}_T$ refers to the Sec. \ref{['subsec: approximating']}[Zoom in for best view]
  • Figure 2: Qualitative comparisons of bicubic-SR results obtained by different methods. "RSRGAN" denotes RankSRGAN 2019RankSRGAN. All images on the right of the black line are sampled from the same vanilla diffusion-based SR model trained by us. [Zoom in for best view]
  • Figure 3: Ablation on values of $R$ and $K$. Shadows denote the standard deviation, the red dotted lines denote LPIPS of SR samples of the subset by DDIM-50 with randomly sampled $\mathbf{x}_T$, indicating the lower-bound of performance, and the green dotted lines denote LPIPS of SR results of the subset by DDIM-50 with $\tilde{\mathbf{x}}_T$, indicating the upper-bound of performance.
  • Figure 4: SR results with shared $\mathbf{x}_T$. Results with ${\mathbf{x}_T}_1$ all have excessive artifacts and results with ${\mathbf{x}_T}_2$ are all over-smooth. Results with shared $\mathbf{x}_T$ share visual features. [Zoom in for best view]
  • Figure 5: Further visual comparisons. [Zoom in for best view]
  • ...and 2 more figures