Table of Contents
Fetching ...

Regularized Schrödinger Bridge: Alleviating Distortion and Exposure Bias in Solving Inverse Problems

Qing Yao, Lijian Gao, Qirong Mao, Ming Dong

TL;DR

This work tackles two core limitations of diffusion-based solvers for inverse problems: the distortion–perception (DP) tradeoff and exposure bias from training–inference input mismatch. It introduces Regularized Schrödinger Bridge (RSB), an SB-based framework that regularizes training with a time-varying interpolation between the posterior mean $X^*$ (minimal distortion) and the ground-truth $X$ (optimal perception), while perturbing both inputs and targets to simulate inference-time errors. The approach yields a DP-regularized objective that guides predictions along the DP curve and exposes the model to predicted inputs, mitigating exposure bias. Empirically, RSB improves perceptual and distortion metrics across speech-denoising and dereverberation benchmarks (WSJ0+WHAM, WSJ0+Reverb, VoiceBank+DEMAND) and demonstrates reduced exposure bias and a more favorable DP tradeoff relative to SB and other diffusion baselines. The results suggest wide applicability of RSB to various ill-posed inverse problems beyond speech enhancement.

Abstract

Diffusion models serve as a powerful generative framework for solving inverse problems. However, they still face two key challenges: 1) the distortion-perception tradeoff, where improving perceptual quality often degrades reconstruction fidelity, and 2) the exposure bias problem, where the training-inference input mismatch leads to prediction error accumulation and reduced reconstruction quality. In this work, we propose the Regularized Schrödinger Bridge (RSB), an adaptation of Schrödinger Bridge tailored for inverse problems that addresses the above limitations. RSB employs a novel regularized training strategy that perturbs both the input states and targets, effectively mitigating exposure bias by exposing the model to simulated prediction errors and also alleviating distortion by well-designed interpolation via the posterior mean. Extensive experiments on two typical inverse problems for speech enhancement demonstrate that RSB outperforms state-of-the-art methods, significantly improving distortion metrics and effectively reducing exposure bias.

Regularized Schrödinger Bridge: Alleviating Distortion and Exposure Bias in Solving Inverse Problems

TL;DR

This work tackles two core limitations of diffusion-based solvers for inverse problems: the distortion–perception (DP) tradeoff and exposure bias from training–inference input mismatch. It introduces Regularized Schrödinger Bridge (RSB), an SB-based framework that regularizes training with a time-varying interpolation between the posterior mean (minimal distortion) and the ground-truth (optimal perception), while perturbing both inputs and targets to simulate inference-time errors. The approach yields a DP-regularized objective that guides predictions along the DP curve and exposes the model to predicted inputs, mitigating exposure bias. Empirically, RSB improves perceptual and distortion metrics across speech-denoising and dereverberation benchmarks (WSJ0+WHAM, WSJ0+Reverb, VoiceBank+DEMAND) and demonstrates reduced exposure bias and a more favorable DP tradeoff relative to SB and other diffusion baselines. The results suggest wide applicability of RSB to various ill-posed inverse problems beyond speech enhancement.

Abstract

Diffusion models serve as a powerful generative framework for solving inverse problems. However, they still face two key challenges: 1) the distortion-perception tradeoff, where improving perceptual quality often degrades reconstruction fidelity, and 2) the exposure bias problem, where the training-inference input mismatch leads to prediction error accumulation and reduced reconstruction quality. In this work, we propose the Regularized Schrödinger Bridge (RSB), an adaptation of Schrödinger Bridge tailored for inverse problems that addresses the above limitations. RSB employs a novel regularized training strategy that perturbs both the input states and targets, effectively mitigating exposure bias by exposing the model to simulated prediction errors and also alleviating distortion by well-designed interpolation via the posterior mean. Extensive experiments on two typical inverse problems for speech enhancement demonstrate that RSB outperforms state-of-the-art methods, significantly improving distortion metrics and effectively reducing exposure bias.

Paper Structure

This paper contains 30 sections, 8 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: DP tradeoff. The smaller the values on both axes, the better the solution quality.
  • Figure 2: Schematic illustration of Distortion-Perception Regularization in our RSB, the key of which is the interpolation perturbation to both the training targets and input states.
  • Figure 3: Prediction errors (a) and evaluation metrics (b) during 50-step sampling on WSJ0+WHAM. The prediction errors at each timestep are calculated as $\mathbb{E}\bigl[\lVert X_{\theta}(\cdot) - X\rVert_2^{2}\bigr]$.
  • Figure 4: Speech denoising performance on WSJ0+WHAM as a function of the number of sampling steps.