Table of Contents
Fetching ...

TRADE: Transfer of Distributions between External Conditions with Normalizing Flows

Stefan Wahl, Armand Rousselot, Felix Draxler, Henrik Schopmans, Ullrich Köthe

TL;DR

TRADE introduces a boundary-value PDE framework for learning conditional distributions $p_ heta(x|c)$ across external parameters by anchoring at a reference condition $c_0$ with $p_ heta(x|c_0)=p(x|c_0)$ and propagating the solution using the gradient constraint $ rac{ m d}{ m d c}p_ heta(x|c)= rac{ m d}{ m d c}p(x|c)$. The key insight is to express $ rac{ m d}{ m d c} ext{log }p(x|c)$ in terms of the unnormalized density $q(x|c)$ as $ rac{ m d}{ m d c} ext{log }p(x|c)= rac{ m d}{ m d c} ext{log }q(x|c)- ext{E}_{p(x|c)}ig[ rac{ m d}{ m d c} ext{log }q(x|c)ig]$, enabling a tractable, physics-informed loss that combines a boundary term with a gradient-consistency term. TRADE supports data-free training and energy-free variants, uses self-normalized importance sampling to estimate necessary expectations, and can discretize or continuously sample the conditioning space. Empirically, it outperforms baselines on multidimensional wells, tempered Bayesian inference, alanine dipeptide temperature transfer, and scalar-field lattice models, demonstrating robust generalization across a wide range of external parameters and problem domains. The approach promises practical impact for simulations and inference tasks where sampling across parameter regimes is expensive or infeasible, providing a flexible, stable alternative to energy-based or heavily restricted architectures.

Abstract

Modeling distributions that depend on external control parameters is a common scenario in diverse applications like molecular simulations, where system properties like temperature affect molecular configurations. Despite the relevance of these applications, existing solutions are unsatisfactory as they require severely restricted model architectures or rely on energy-based training, which is prone to instability. We introduce TRADE, which overcomes these limitations by formulating the learning process as a boundary value problem. By initially training the model for a specific condition using either i.i.d.~samples or backward KL training, we establish a boundary distribution. We then propagate this information across other conditions using the gradient of the unnormalized density with respect to the external parameter. This formulation, akin to the principles of physics-informed neural networks, allows us to efficiently learn parameter-dependent distributions without restrictive assumptions. Experimentally, we demonstrate that TRADE achieves excellent results in a wide range of applications, ranging from Bayesian inference and molecular simulations to physical lattice models.

TRADE: Transfer of Distributions between External Conditions with Normalizing Flows

TL;DR

TRADE introduces a boundary-value PDE framework for learning conditional distributions across external parameters by anchoring at a reference condition with and propagating the solution using the gradient constraint . The key insight is to express in terms of the unnormalized density as , enabling a tractable, physics-informed loss that combines a boundary term with a gradient-consistency term. TRADE supports data-free training and energy-free variants, uses self-normalized importance sampling to estimate necessary expectations, and can discretize or continuously sample the conditioning space. Empirically, it outperforms baselines on multidimensional wells, tempered Bayesian inference, alanine dipeptide temperature transfer, and scalar-field lattice models, demonstrating robust generalization across a wide range of external parameters and problem domains. The approach promises practical impact for simulations and inference tasks where sampling across parameter regimes is expensive or infeasible, providing a flexible, stable alternative to energy-based or heavily restricted architectures.

Abstract

Modeling distributions that depend on external control parameters is a common scenario in diverse applications like molecular simulations, where system properties like temperature affect molecular configurations. Despite the relevance of these applications, existing solutions are unsatisfactory as they require severely restricted model architectures or rely on energy-based training, which is prone to instability. We introduce TRADE, which overcomes these limitations by formulating the learning process as a boundary value problem. By initially training the model for a specific condition using either i.i.d.~samples or backward KL training, we establish a boundary distribution. We then propagate this information across other conditions using the gradient of the unnormalized density with respect to the external parameter. This formulation, akin to the principles of physics-informed neural networks, allows us to efficiently learn parameter-dependent distributions without restrictive assumptions. Experimentally, we demonstrate that TRADE achieves excellent results in a wide range of applications, ranging from Bayesian inference and molecular simulations to physical lattice models.

Paper Structure

This paper contains 46 sections, 5 theorems, 48 equations, 10 figures, 9 tables.

Key Result

Theorem 4.1

Given an unnormalized density $q(x|c)$ that is differentiable in $c$, the derivative of the normalized density $p(x|c)$ with respect to $c$ reads:

Figures (10)

  • Figure 1: Our approach to train a conditional normalizing flow $p_\theta(x|c)$. Left: At $c=c_0$, the flow is trained using NLL. Right: By learning the gradient of the distribution with respect to $c$ based on prior knowledge, the distribution learned at $c_0$ is propagated to other conditions $c \neq c_0$ without additional training data.
  • Figure 2: Calibration curves of the estimated $r$ (\ref{['eq:TwoMoonsRDefinition']}) at different likelihood powers $\beta$ for the two moons dataset. We compare TRADE to a model trained only at $\beta=1.0$ for $\beta \in \{0.5, 1, 2\}$. Our method successfully generalizes to different $\beta$ while maintaining the same performance at $\beta=1.0$. The model trained only at $\beta=1.0$ fails to generalize to other values of $\beta$.
  • Figure 3: A comparison of TSF and backward KL training to TRADE trained on MD simulations of Alanine Dipeptide. Both models were trained at 600K and are evaluated at 300K. From left to right: Ramachandran plots of model samples (TSF, backward KL, TRADE) and molecular dynamics (MD), marginal density of the $\phi$ angle, ground truth energy of model and MD samples. For each model we plot the best result of three runs.
  • Figure 4: Physical observables for the scalar field theory. TRADE and a combination of NLL and energy-based training accurately follow the ground truth obtained using MCMC, while NLL alone is detrimental. A: Expected absolute magnetization per spin. B: Expected ground truth action per spin. C: Susceptibility. D: Binder cumulant. The values marked with dots represent the $\kappa$ at which training data is available.
  • Figure 5: Relative ESS for the models trained for the scalar field theory. The values marked with dots represent the $\kappa$ at which training data is available. TRADE outperforms the baseline in most of the examined range of $\kappa$ and is also less fluctuating.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Theorem 4.1
  • proof
  • Theorem A.1
  • proof
  • Theorem A.2
  • proof
  • Remark A.1
  • Theorem A.3
  • Theorem A.4: The Lebesgue Dominated Convergence Theorem, Proposition 6 of 7TheIntegralofUnboundedFunctions)
  • proof
  • ...and 1 more