RetroBridge: Modeling Retrosynthesis with Markov Bridges
Ilia Igashov, Arne Schneuing, Marwin Segler, Michael Bronstein, Bruno Correia
TL;DR
This work reframes retrosynthesis planning as learning the probabilistic dependency between two intractable discrete distributions: the product space $p_{mathcal{X}}$ and the reactant space $p_{mathcal{Y}}$. It introduces the Markov Bridge Model, a generative framework that uses trajectory sampling between endpoints to approximate this dependency, and applies it to chemistry through RetroBridge, a template-free single-step retrosynthesis method. Empirical results on USPTO-50k show RetroBridge achieving state-of-the-art performance among template-free approaches and competitive performance versus template-based methods, with explicit uncertainty-based scoring for sample ranking. The approach highlights the advantages of sequential, probabilistic mapping over diffusion-based methods when modeling mappings between two discrete distributions, and points to future directions in conditioning, reaction types, and multi-step planning for practical deployment.
Abstract
Retrosynthesis planning is a fundamental challenge in chemistry which aims at designing reaction pathways from commercially available starting materials to a target molecule. Each step in multi-step retrosynthesis planning requires accurate prediction of possible precursor molecules given the target molecule and confidence estimates to guide heuristic search algorithms. We model single-step retrosynthesis planning as a distribution learning problem in a discrete state space. First, we introduce the Markov Bridge Model, a generative framework aimed to approximate the dependency between two intractable discrete distributions accessible via a finite sample of coupled data points. Our framework is based on the concept of a Markov bridge, a Markov process pinned at its endpoints. Unlike diffusion-based methods, our Markov Bridge Model does not need a tractable noise distribution as a sampling proxy and directly operates on the input product molecules as samples from the intractable prior distribution. We then address the retrosynthesis planning problem with our novel framework and introduce RetroBridge, a template-free retrosynthesis modeling approach that achieves state-of-the-art results on standard evaluation benchmarks.
