Solving Bayesian inverse problems with diffusion priors and off-policy RL
Luca Scimeca, Siddarth Venkatraman, Moksh Jain, Minsu Kim, Marcin Sendera, Mohsin Hasan, Luke Rowe, Sarthak Mittal, Pablo Lemos, Emmanuel Bengio, Alexandre Adam, Jarrid Rector-Brooks, Yashar Hezaveh, Laurence Perreault-Levasseur, Yoshua Bengio, Glen Berseth, Nikolay Malkin
TL;DR
The paper tackles Bayesian inverse problems in high dimensions where exact posterior sampling is intractable. It leverages Relative Trajectory Balance (RTB), an off-policy RL objective, to finetune a diffusion prior so that samples follow $p_{post}(x) \propto p_{prior}(x) r(x)$, enabling efficient posterior sampling. It extends RTB to conditional posteriors, integrates off-policy adaptation and compatibility with methods like DPS/FPS, and validates on linear and nonlinear vision tasks plus gravitational lensing, reporting metrics like log Z to assess posterior fidelity. The results show RTB achieves competitive posterior quality with improved mode coverage and stability compared to training-free baselines, suggesting practical applicability to a broad range of scientific inverse problems.
Abstract
This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (RL) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.
