Table of Contents
Fetching ...

Unpaired Image-to-Image Translation via Neural Schrödinger Bridge

Beomsu Kim, Gihyun Kwon, Kwanyoung Kim, Jong Chul Ye

TL;DR

Diffusion models are powerful but constrained by Gaussian priors for unpaired image-to-image translation. The authors introduce Unpaired Neural Schrödinger Bridge (UNSB), which reframes Schrödinger Bridges as a sequence of adversarial transport problems with a KL-divergence constraint and uses a time-conditioned generator to learn a chain of conditional mappings. They diagnose the curse of dimensionality as the core bottleneck for SB in high dimensions and validate UNSB through toy sanity checks and large-scale I2I benchmarks (e.g., Horse2Zebra, Summer2Winter, Map2Satellite), where it outperforms GAN- and diffusion-based baselines. This approach enables scalable, multi-step SB-based translation and suggests a new direction for applying diffusion-style models to unpaired, high-resolution image translation tasks.

Abstract

Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. While diffusion models have achieved remarkable progress, they have limitations in unpaired image-to-image (I2I) translation tasks due to the Gaussian prior assumption. Schrödinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. Yet, to our best knowledge, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose Unpaired Neural Schrödinger Bridge (UNSB), which expresses the SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We show that UNSB is scalable and successfully solves various unpaired I2I translation tasks. Code: \url{https://github.com/cyclomon/UNSB}

Unpaired Image-to-Image Translation via Neural Schrödinger Bridge

TL;DR

Diffusion models are powerful but constrained by Gaussian priors for unpaired image-to-image translation. The authors introduce Unpaired Neural Schrödinger Bridge (UNSB), which reframes Schrödinger Bridges as a sequence of adversarial transport problems with a KL-divergence constraint and uses a time-conditioned generator to learn a chain of conditional mappings. They diagnose the curse of dimensionality as the core bottleneck for SB in high dimensions and validate UNSB through toy sanity checks and large-scale I2I benchmarks (e.g., Horse2Zebra, Summer2Winter, Map2Satellite), where it outperforms GAN- and diffusion-based baselines. This approach enables scalable, multi-step SB-based translation and suggests a new direction for applying diffusion-style models to unpaired, high-resolution image translation tasks.

Abstract

Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. While diffusion models have achieved remarkable progress, they have limitations in unpaired image-to-image (I2I) translation tasks due to the Gaussian prior assumption. Schrödinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. Yet, to our best knowledge, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose Unpaired Neural Schrödinger Bridge (UNSB), which expresses the SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We show that UNSB is scalable and successfully solves various unpaired I2I translation tasks. Code: \url{https://github.com/cyclomon/UNSB}
Paper Structure (20 sections, 3 theorems, 29 equations, 16 figures, 11 tables)

This paper contains 20 sections, 3 theorems, 29 equations, 16 figures, 11 tables.

Key Result

Theorem 1

For any $t_i$, consider the following constrained optimization problem and define the distributions where $s_{i+1} \coloneqq (t_{i+1} - t_i) / (1 - t_i)$ and If $\phi_i$ solves Eq. (eq:optim), then we have

Figures (16)

  • Figure 1: Left: Illustration of trajectories for Vanilla SB and UNSB. Due to the curse of dimensionality, observed data in high dimensions become sparse and fail to describe image manifolds accurately. Vanilla SB learns optimal transport between observed data, leading to undesirable mappings. UNSB employs adversarial learning and regularization to learn an optimal transport mapping which successfully generalizes beyond observed data. Right: UNSB can be interpreted as successively refining the predicted target domain image, enabling the model to modify fine details while preserving semantics. See Section \ref{['sec:unsb']}. Here, NFE stands for the number of function evaluations.
  • Figure 2: Curse of dimensionality.
  • Figure 3: Generation and training process of UNSB for time step $t_i$. sg means stop gradient.
  • Figure 4: Results on two shells.
  • Figure 5: Qualitative comparison of image-to-image translation results from our UNSB and baseline I2I methods. Compared to other one-step baseline methods, our model generates more realistic domain-changed outputs while preserving the structural information of the source images.
  • ...and 11 more figures

Theorems & Definitions (7)

  • Theorem 1
  • proof : Proof Sketch.
  • Lemma 1: Self-similarity
  • proof
  • Lemma 2: Static formulation of restricted SBs
  • proof
  • proof : Proof of Theorem 1