Table of Contents
Fetching ...

Implicit Image-to-Image Schrodinger Bridge for Image Restoration

Yuang Wang, Siyeop Yoon, Pengfei Jin, Matthew Tivnan, Sifan Song, Zhennong Chen, Rui Hu, Li Zhang, Quanzheng Li, Zhiqiang Chen, Dufan Wu

TL;DR

The Implicit Image-to-Image Schrödinger Bridge (I3SB) is introduced to further accelerate the generative process of I2SB and achieves the same perceptual quality with fewer generative steps, while maintaining or improving fidelity to the ground truth.

Abstract

Diffusion-based models have demonstrated remarkable effectiveness in image restoration tasks; however, their iterative denoising process, which starts from Gaussian noise, often leads to slow inference speeds. The Image-to-Image Schrödinger Bridge (I$^2$SB) offers a promising alternative by initializing the generative process from corrupted images while leveraging training techniques from score-based diffusion models. In this paper, we introduce the Implicit Image-to-Image Schrödinger Bridge (I$^3$SB) to further accelerate the generative process of I$^2$SB. I$^3$SB restructures the generative process into a non-Markovian framework by incorporating the initial corrupted image at each generative step, effectively preserving and utilizing its information. To enable direct use of pretrained I$^2$SB models without additional training, we ensure consistency in marginal distributions. Extensive experiments across many image corruptions, including noise, low resolution, JPEG compression, and sparse sampling, and multiple image modalities, such as natural, human face, and medical images, demonstrate the acceleration benefits of I$^3$SB. Compared to I$^2$SB, I$^3$SB achieves the same perceptual quality with fewer generative steps, while maintaining or improving fidelity to the ground truth.

Implicit Image-to-Image Schrodinger Bridge for Image Restoration

TL;DR

The Implicit Image-to-Image Schrödinger Bridge (I3SB) is introduced to further accelerate the generative process of I2SB and achieves the same perceptual quality with fewer generative steps, while maintaining or improving fidelity to the ground truth.

Abstract

Diffusion-based models have demonstrated remarkable effectiveness in image restoration tasks; however, their iterative denoising process, which starts from Gaussian noise, often leads to slow inference speeds. The Image-to-Image Schrödinger Bridge (ISB) offers a promising alternative by initializing the generative process from corrupted images while leveraging training techniques from score-based diffusion models. In this paper, we introduce the Implicit Image-to-Image Schrödinger Bridge (ISB) to further accelerate the generative process of ISB. ISB restructures the generative process into a non-Markovian framework by incorporating the initial corrupted image at each generative step, effectively preserving and utilizing its information. To enable direct use of pretrained ISB models without additional training, we ensure consistency in marginal distributions. Extensive experiments across many image corruptions, including noise, low resolution, JPEG compression, and sparse sampling, and multiple image modalities, such as natural, human face, and medical images, demonstrate the acceleration benefits of ISB. Compared to ISB, ISB achieves the same perceptual quality with fewer generative steps, while maintaining or improving fidelity to the ground truth.
Paper Structure (31 sections, 3 theorems, 47 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 3 theorems, 47 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

If $g_n$ is set to 0, then equation (eq:i3sb_xn_2) can be treated as an Euler discretization of the following ODE:

Figures (8)

  • Figure 1: Non-Markovian generative process of I$^3$SB. Solid arrows denote original dependencies in I$^2$SB, and dotted arrows signify additional dependencies in I$^3$SB.
  • Figure 2: FID-NFE and FID-SSIM curves for the sr4x-bicubic and JPEG-10 tasks in natural image experiments. Each point on the FID-SSIM curves represents the FID and SSIM values at a specific NFE, ranging from 1 to 100. As NFE increases, the FID-SSIM curves shift from the top-right to the bottom-left. Red curves correspond to I$^3$SB, and blue curves correspond to I$^2$SB.
  • Figure 3: FID-NFE and FID-SSIM curves for sr4x-bicubic and JPEG-10 tasks in human face experiments. Each point on the FID-SSIM curves represents the FID and SSIM values at a specific NFE, ranging from 2 to 100. As NFE increases, the FID-SSIM curves shift from the top-right to the bottom-left. Red curves correspond to I$^3$SB, and blue curves correspond to I$^2$SB.
  • Figure 4: FID-NFE and FID-SSIM curves for CT sparse view reconstruction, sr4x and denoising tasks in medical image experiments. Each point on the FID-SSIM curves represents the FID and SSIM values at a specific NFE, ranging from 2 to 200. As NFE increases, the FID-SSIM curves shift from the top-right to the bottom-left. Red curves correspond to I$^3$SB, and blue curves correspond to I$^2$SB.
  • Figure 5: Visualization result for the sr4x-bicubic task in the natural image experiments. Details within blue and yellow boxes are zoomed in for enhanced visual clarity. The NFE for I$^2$SB is 100, and for I$^3$SB is 25.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Lemma 1
  • Theorem 1
  • Lemma 2