Table of Contents
Fetching ...

Boosted GFlowNets: Improving Exploration via Sequential Learning

Pedro Dall'Antonia, Tiago da Silva, Daniel Augusto de Souza, César Lincoln C. Mattos, Diego Mesquita

TL;DR

Boosted GFlowNets address exploration bias in Generative Flow Networks by training an ensemble where each booster learns the residual reward not captured by earlier stages, reallocating probability mass toward hard-to-reach modes while preserving the TB framework. The approach formalizes a residual target via $R(x)=\widehat{R}_{old}(x)+R_{res}(x)$ and introduces a boosted loss $\mathcal{L}_{boost}$ that yields monotone improvement and no degradation when past models already approximate the target. Sampling from the ensemble uses a mass-weighted mixture over boosters, with a theoretical guarantee that the induced terminal distribution remains proportional to $R(x)$. Empirically, BGFn improves mode coverage and diversity on synthetic multimodal landscapes and augments AMP sequence design, while maintaining TB stability and avoiding degradation from redundant boosters.

Abstract

Generative Flow Networks (GFlowNets) are powerful samplers for compositional objects that, by design, sample proportionally to a given non-negative reward. Nonetheless, in practice, they often struggle to explore the reward landscape evenly: trajectories toward easy-to-reach regions dominate training, while hard-to-reach modes receive vanishing or uninformative gradients, leading to poor coverage of high-reward areas. We address this imbalance with Boosted GFlowNets, a method that sequentially trains an ensemble of GFlowNets, each optimizing a residual reward that compensates for the mass already captured by previous models. This residual principle reactivates learning signals in underexplored regions and, under mild assumptions, ensures a monotone non-degradation property: adding boosters cannot worsen the learned distribution and typically improves it. Empirically, Boosted GFlowNets achieve substantially better exploration and sample diversity on multimodal synthetic benchmarks and peptide design tasks, while preserving the stability and simplicity of standard trajectory-balance training.

Boosted GFlowNets: Improving Exploration via Sequential Learning

TL;DR

Boosted GFlowNets address exploration bias in Generative Flow Networks by training an ensemble where each booster learns the residual reward not captured by earlier stages, reallocating probability mass toward hard-to-reach modes while preserving the TB framework. The approach formalizes a residual target via and introduces a boosted loss that yields monotone improvement and no degradation when past models already approximate the target. Sampling from the ensemble uses a mass-weighted mixture over boosters, with a theoretical guarantee that the induced terminal distribution remains proportional to . Empirically, BGFn improves mode coverage and diversity on synthetic multimodal landscapes and augments AMP sequence design, while maintaining TB stability and avoiding degradation from redundant boosters.

Abstract

Generative Flow Networks (GFlowNets) are powerful samplers for compositional objects that, by design, sample proportionally to a given non-negative reward. Nonetheless, in practice, they often struggle to explore the reward landscape evenly: trajectories toward easy-to-reach regions dominate training, while hard-to-reach modes receive vanishing or uninformative gradients, leading to poor coverage of high-reward areas. We address this imbalance with Boosted GFlowNets, a method that sequentially trains an ensemble of GFlowNets, each optimizing a residual reward that compensates for the mass already captured by previous models. This residual principle reactivates learning signals in underexplored regions and, under mild assumptions, ensures a monotone non-degradation property: adding boosters cannot worsen the learned distribution and typically improves it. Empirically, Boosted GFlowNets achieve substantially better exploration and sample diversity on multimodal synthetic benchmarks and peptide design tasks, while preserving the stability and simplicity of standard trajectory-balance training.

Paper Structure

This paper contains 46 sections, 8 theorems, 72 equations, 5 figures, 5 tables.

Key Result

Theorem 1

Let $S := \{x \in \mathcal{X} : P_F(x) > 0\}$ be the support of the forward policy's terminal distribution. Assume that the Trajectory Balance loss has zero expectation, $\mathbb{E}_{\tau \sim P_F}[\mathcal{L}_{TB}(\tau)] = 0$. Further assume that for every terminal $x \in S$, the backward policy $P and, where $\mathbb{I}[\cdot]$ is the indicator function.

Figures (5)

  • Figure 1: Illustration of Boosted GFlowNets on a multimodal target. The single GFN covers only the nearest mode. The first and second boosters progressively capture additional modes, while later boosters learn to allocate negligible flow. Combined, the ensemble recovers the full target distribution.
  • Figure 2: Learning curves on synthetic targets. Solid lines show the mean $L_1$ across seeds; shaded bands denote $\pm1$ std. Vertical dashed lines (3k and 6k epochs) indicate booster activations for BGFN(2) and BGFN(3). The single-GFN baseline (TB) plateaus once easy modes are fit; boosted stages keep reducing error by reallocating mass toward hard modes.
  • Figure 3: Unique predicted-resistant peptides across noise levels. Every 50 epochs we sample 1,000 peptides from the current policy and accumulate the number of unique sequences whose predicted activity is at least $0.94$ for at least one microorganism. Curves show the cumulative count over epochs for a single GFN (blue) and its boosted counterparts (orange, green); shaded regions denote $\pm 1$ standard deviation across seeds. The $y$-axis is logarithmic. Vertical dashed lines mark activation of the first and second boosters. Left: off-policy $\varepsilon=0.2$; Right: off-policy $\varepsilon=0.3$.
  • Figure 4: Synthetic targets across exploration levels. Columns correspond to $\varepsilon\in\{0,0.1,0.2,0.3,0.4,0.5\}$. Rows (top to bottom): Eight-Gaussians, Rings, Moons. Compare L1 distance between the true probability and the model as exploration increases.
  • Figure 5: Peptide Generation across exploration levels. Columns correspond to $\varepsilon\in\{0,0.1,0.2,0.3\}$. Comparing number o unique peptides generated as exploration increases.

Theorems & Definitions (12)

  • Theorem 1: Zero variance at optimum
  • Theorem 2: Correctness of the Boosted Loss
  • Theorem 3: Correctness of the Ensemble Sampling Process
  • Proposition 1
  • Theorem 1: Zero variance at the TB optimum
  • proof
  • Theorem 2: Correctness of the boosted loss
  • proof
  • Theorem 3: Correctness of the sampling process
  • proof : Proof:
  • ...and 2 more