Regularized Schrödinger Bridge: Alleviating Distortion and Exposure Bias in Solving Inverse Problems
Qing Yao, Lijian Gao, Qirong Mao, Ming Dong
TL;DR
This work tackles two core limitations of diffusion-based solvers for inverse problems: the distortion–perception (DP) tradeoff and exposure bias from training–inference input mismatch. It introduces Regularized Schrödinger Bridge (RSB), an SB-based framework that regularizes training with a time-varying interpolation between the posterior mean $X^*$ (minimal distortion) and the ground-truth $X$ (optimal perception), while perturbing both inputs and targets to simulate inference-time errors. The approach yields a DP-regularized objective that guides predictions along the DP curve and exposes the model to predicted inputs, mitigating exposure bias. Empirically, RSB improves perceptual and distortion metrics across speech-denoising and dereverberation benchmarks (WSJ0+WHAM, WSJ0+Reverb, VoiceBank+DEMAND) and demonstrates reduced exposure bias and a more favorable DP tradeoff relative to SB and other diffusion baselines. The results suggest wide applicability of RSB to various ill-posed inverse problems beyond speech enhancement.
Abstract
Diffusion models serve as a powerful generative framework for solving inverse problems. However, they still face two key challenges: 1) the distortion-perception tradeoff, where improving perceptual quality often degrades reconstruction fidelity, and 2) the exposure bias problem, where the training-inference input mismatch leads to prediction error accumulation and reduced reconstruction quality. In this work, we propose the Regularized Schrödinger Bridge (RSB), an adaptation of Schrödinger Bridge tailored for inverse problems that addresses the above limitations. RSB employs a novel regularized training strategy that perturbs both the input states and targets, effectively mitigating exposure bias by exposing the model to simulated prediction errors and also alleviating distortion by well-designed interpolation via the posterior mean. Extensive experiments on two typical inverse problems for speech enhancement demonstrate that RSB outperforms state-of-the-art methods, significantly improving distortion metrics and effectively reducing exposure bias.
