Table of Contents
Fetching ...

Free-form Flows: Make Any Architecture a Normalizing Flow

Felix Draxler, Peter Sorrenson, Lea Zimmermann, Armand Rousselot, Ullrich Köthe

TL;DR

Free-form Flows (FFF) remove the architectural invertibility constraints of traditional normalizing flows by training dimension-preserving networks with a fast maximum-likelihood surrogate and a reconstruction loss, enabling truly flexible inductive biases. The core idea leverages a gradient estimator for the log-determinant of the Jacobian via a trace trick and an inverse-function insight, replacing exact Jacobian determinants with efficient vector-Jacobian and Jacobian-vector products. Theoretical results show that minimizing the FFF objective yields the same global minima as classical ML when the reconstruction loss is small, and that the relaxed objective upper-bounds a spread KL divergence between data and model. Empirically, FFF matches or surpasses invertible-flow baselines on SBI and molecule-generation benchmarks, while offering much faster sampling and easier adaptation to domain-specific architectures, such as $E(n)$-equivariant networks for QM9. This framework broadens the applicability of likelihood-based generative modeling to diverse scientific problems by shifting focus from strict invertibility to task-tailored inductive biases and efficient gradient estimation.

Abstract

Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. Our approach allows placing the emphasis on tailoring inductive biases precisely to the task at hand. Specifically, we achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks. Moreover, our method is competitive in an inverse problem benchmark, while employing off-the-shelf ResNet architectures.

Free-form Flows: Make Any Architecture a Normalizing Flow

TL;DR

Free-form Flows (FFF) remove the architectural invertibility constraints of traditional normalizing flows by training dimension-preserving networks with a fast maximum-likelihood surrogate and a reconstruction loss, enabling truly flexible inductive biases. The core idea leverages a gradient estimator for the log-determinant of the Jacobian via a trace trick and an inverse-function insight, replacing exact Jacobian determinants with efficient vector-Jacobian and Jacobian-vector products. Theoretical results show that minimizing the FFF objective yields the same global minima as classical ML when the reconstruction loss is small, and that the relaxed objective upper-bounds a spread KL divergence between data and model. Empirically, FFF matches or surpasses invertible-flow baselines on SBI and molecule-generation benchmarks, while offering much faster sampling and easier adaptation to domain-specific architectures, such as -equivariant networks for QM9. This framework broadens the applicability of likelihood-based generative modeling to diverse scientific problems by shifting focus from strict invertibility to task-tailored inductive biases and efficient gradient estimation.

Abstract

Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. Our approach allows placing the emphasis on tailoring inductive biases precisely to the task at hand. Specifically, we achieve excellent results in molecule generation benchmarks utilizing -equivariant networks. Moreover, our method is competitive in an inverse problem benchmark, while employing off-the-shelf ResNet architectures.
Paper Structure (42 sections, 11 theorems, 92 equations, 5 figures, 7 tables, 2 algorithms)

This paper contains 42 sections, 11 theorems, 92 equations, 5 figures, 7 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $f_\theta: \mathbb R^D \to \mathbb R^D$ be a $C^1$ invertible function parameterized by $\theta$. Then, for all $x \in \mathbb R^D$:

Figures (5)

  • Figure 1: Free-form flows (FFF) train a pair of encoder and decoder neural networks with a fast maximum likelihood estimator $\mathcal{L}_{\operatorname{ML}}^{g}$ and reconstruction loss $\mathcal{L}_{\operatorname{R}}$. This enables training any dimension-preserving architecture as a one-step generative model. For example, an equivariant graph neural network can be trained on the QM9 dataset to generate molecules by predicting atom positions and properties in a single decoder evaluation. (Bottom) Stable molecules sampled from our $E(3)$-FFF trained on the QM9 dataset for several molecule sizes.
  • Figure 2: Gradient landscape of $\mathcal{L}^{f^{-1}}$ (left) and $\mathcal{L}^g$ (right) for a linear 1D model with $f(x) = ax$, $g(z) = bz$, $q(x) = \mathcal{N}(0, 1.5^2)$ and $\beta = 1$. The flow lines show the direction and the contours show the magnitude of the gradient. White dots are critical points. $\mathcal{L}^g$ has the same minima $(\pm 2/3, \pm 1.5)$ as $\mathcal{L}^{f^{-1}}$, with an additional critical point at $a=b=0$. This is a saddle, so we will not converge to it in practice. Therefore optimizing $\mathcal{L}^g$ results in the same solutions as $\mathcal{L}^{f^{-1}}$.
  • Figure 3: C2ST accuracy on the SBI benchmark datasets. We compare our method (FFF) against flow matching (FM) wildberger2023flow and the neural spline flow (NSF) baseline in the benchmark dataset lueckmann2021benchmarking. The accuracy is averaged over ten different observations, with error bars indicating the standard deviation. Our performance is comparable to the competitors across all datasets, with no model being universally better or worse.
  • Figure 4: Solutions to $\mathcal{L}^{f^{-1}}$ for various $\beta$. The data is the two-component Gaussian mixture shown in the lower panels. Solid blue lines show $f_\theta$ and dashed orange lines show $g_\phi$. Note that $f_\theta$ is not invertible between the mixtures when $\beta$ is small.
  • Figure 5: Intuition behind theorem \ref{['appthm:partitions']}: Comparison of invalid solutions to learning a Gaussian mixture of three modes with non-invertible encoders (blue, orange, green), compared to an invertible encoder (red). (Left) As the encoder is not invertible by construction, it may learn to reuse each latent code $z$ once for each disconnected component. This reduces the negative-likelihood at each point, as the derivative $f_\theta'(x)$ is larger at each data point. The decoder (dotted gray line) then cannot reconstruct the data. (Right) Increasing $\beta$ increases the importance of reconstruction over maximum likelihood and thus selects the best solution (red).

Theorems & Definitions (18)

  • Theorem 3.1
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem A.1
  • proof
  • Theorem A.2
  • proof
  • Theorem A.3
  • proof
  • ...and 8 more