Amortized Sampling with Transferable Normalizing Flows
Charlie B. Tan, Majdi Hassan, Leon Klein, Saifuddin Syed, Dominique Beaini, Michael M. Bronstein, Alexander Tong, Kirill Neklyudov
TL;DR
Prose introduces a large-scale, transferable all-atom normalizing flow trained on peptide MD data that enables zero-shot, uncorrelated sampling across varying sequence lengths up to eight residues. It combines TarFlow-style autoregressive flows with adaptive conditioning and chemistry-aware permutations to achieve cross-system transfer while retaining efficient likelihood evaluation, and uses SNIS-based inference with self-improvement and temperature-transfer capabilities. Across the ManyPeptidesMD dataset, Prose delivers state-of-the-art sampling performance, surpassing MD under similar compute and providing robust transfer to unseen systems and temperatures. The work is complemented by extensive ablations, multiple sampling strategies, and open-source resources to encourage further exploration of amortized sampling methods in molecular modeling.
Abstract
Efficient equilibrium sampling of molecular conformations remains a core challenge in computational chemistry and statistical inference. Classical approaches such as molecular dynamics or Markov chain Monte Carlo inherently lack amortization; the computational cost of sampling must be paid in full for each system of interest. The widespread success of generative models has inspired interest towards overcoming this limitation through learning sampling algorithms. Despite performing competitively with conventional methods when trained on a single system, learned samplers have so far demonstrated limited ability to transfer across systems. We demonstrate that deep learning enables the design of scalable and transferable samplers by introducing Prose, a 285 million parameter all-atom transferable normalizing flow trained on a corpus of peptide molecular dynamics trajectories up to 8 residues in length. Prose draws zero-shot uncorrelated proposal samples for arbitrary peptide systems, achieving the previously intractable transferability across sequence length, whilst retaining the efficient likelihood evaluation of normalizing flows. Through extensive empirical evaluation we demonstrate the efficacy of Prose as a proposal for a variety of sampling algorithms, finding a simple importance sampling-based finetuning procedure to achieve competitive performance to established methods such as sequential Monte Carlo. We open-source the Prose codebase, model weights, and training dataset, to further stimulate research into amortized sampling methods and finetuning objectives.
