MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets

Dominic Phillips; Flaviu Cipcigan

MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets

Dominic Phillips, Flaviu Cipcigan

TL;DR

The paper tackles exploration in continuous GFlowNets with black-box rewards and introduces Adapted Metadynamics, recasting the reward landscape as a potential $V(x) = - \frac{1}{\beta'} \ln r(x)$ and using a KDE-based marginal potential to guide off-policy exploration, thereby enabling the target distribution $P^{\perp}$ to be learned without reward gradients. It presents MetaGFN, a training framework that couples Adapted Metadynamics with a replay-buffered loop and TB/DB/STB losses, and proves consistency with standard metadynamics in the appropriate limit. Empirically, MetaGFN outperforms several off-policy exploration strategies across multiple continuous environments, including alanine dipeptide, by more reliably discovering distant reward modes and achieving lower L1 error relative to the true reward distribution. The approach highlights a cross-disciplinary bridge from molecular dynamics to generative modeling, offering a practical gradient-free exploration technique for continuous action spaces and suggesting extensions to adaptive CVs and well-tempered variants.

Abstract

Generative Flow Networks (GFlowNets) are a class of generative models that sample objects in proportion to a specified reward function through a learned policy. They can be trained either on-policy or off-policy, needing a balance between exploration and exploitation for fast convergence to a target distribution. While exploration strategies for discrete GFlowNets have been studied, exploration in the continuous case remains to be investigated, despite the potential for novel exploration algorithms due to the local connectedness of continuous domains. Here, we introduce Adapted Metadynamics, a variant of metadynamics that can be applied to arbitrary black-box reward functions on continuous domains. We use Adapted Metadynamics as an exploration strategy for continuous GFlowNets. We show several continuous domains where the resulting algorithm, MetaGFN, accelerates convergence to the target distribution and discovers more distant reward modes than previous off-policy exploration strategies used for GFlowNets.

MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets

TL;DR

The paper tackles exploration in continuous GFlowNets with black-box rewards and introduces Adapted Metadynamics, recasting the reward landscape as a potential

and using a KDE-based marginal potential to guide off-policy exploration, thereby enabling the target distribution

to be learned without reward gradients. It presents MetaGFN, a training framework that couples Adapted Metadynamics with a replay-buffered loop and TB/DB/STB losses, and proves consistency with standard metadynamics in the appropriate limit. Empirically, MetaGFN outperforms several off-policy exploration strategies across multiple continuous environments, including alanine dipeptide, by more reliably discovering distant reward modes and achieving lower L1 error relative to the true reward distribution. The approach highlights a cross-disciplinary bridge from molecular dynamics to generative modeling, offering a practical gradient-free exploration technique for continuous action spaces and suggesting extensions to adaptive CVs and well-tempered variants.

Abstract

Paper Structure (21 sections, 3 theorems, 32 equations, 12 figures, 1 table, 3 algorithms)

This paper contains 21 sections, 3 theorems, 32 equations, 12 figures, 1 table, 3 algorithms.

Introduction
Preliminaries
Discrete GFlowNets
Continuous GFlowNets
Exploration strategies for GFlowNets
Metadynamics and collective Variables
MetaGFN: Adapted Metadynamics for GFlowNets
Adapted Metadynamics
MetaGFN
Experiments
Results
Limitations
Conclusions
Loss functions
Langevin dynamics
...and 6 more sections

Key Result

Theorem 3.1

If the collective variable $z(x)$ is analytic with a bounded domain, then where $V = V(z') \vcentcolon= \int_{\mathcal{X}} \delta(z' - z(x))V(x) \mathrm{d} x$.

Figures (12)

Figure 1: Illustration of metadynamics in a multi-well potential. Regular deposition of a bias leads to a total potential that gradually flattens, encouraging exploration.
Figure 2: Line environment reward density.
Figure 3: Free energy surface of alanine dipeptide.
Figure 4: Grid environment reward density in dimension 2.
Figure 5: The L1 difference between on-policy and reward distribution during training for different loss functions and exploration strategies. The mean is plotted with standard error over 10 repeats. DB - Detailed Balance loss, TB - Trajectory Balance loss, STB - Subtrajectory Balance loss.
...and 7 more figures

Theorems & Definitions (7)

Definition 2.1: Measurable pointed graph (MPG)
Theorem 3.1
Lemma C.1
proof
Lemma C.2
proof
proof

MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets

TL;DR

Abstract

MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (7)