Table of Contents
Fetching ...

Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets

Idriss Malek, Aya Laajil, Abhijith Sharma, Eric Moulines, Salem Lahlou

TL;DR

This work tackles mode collapse in Generative Flow Networks (GFlowNets) by introducing Loss-Guided GFlowNets (LGGFN), where an auxiliary GFlowNet is steered by the main model's training loss via the augmented reward $R_{aux} = R_{main} + \lambda \mathcal{L}_{main}$. The main idea is to bias exploration toward high-loss, underexplored regions while maintaining stability through a mixture of on-policy training and auxiliary-sampled trajectories. The approach yields improved exploration efficiency and diversity across diverse benchmarks (Hypergrid, bit sequences, Bayesian structure learning, and mRNA design), including a >40x increase in unique modes and ~99% reduction in exploration error in challenging tasks. The paper also provides theoretical intuition (implicit curriculum and gradient variance reduction) and discusses robustness and potential extensions, highlighting LGGFN as a simple, generalizable enhancement for GFlowNets with practical impact in scientific design tasks. Overall, LGGFN offers a principled, scalable method to mitigate mode collapse and accelerate discovery in complex, sparse-reward domains.

Abstract

Although Generative Flow Networks (GFlowNets) are designed to capture multiple modes of a reward function, they often suffer from mode collapse in practice, getting trapped in early-discovered modes and requiring prolonged training to find diverse solutions. Existing exploration techniques often rely on heuristic novelty signals. We propose Loss-Guided GFlowNets (LGGFN), a novel approach where an auxiliary GFlowNet's exploration is \textbf{directly driven by the main GFlowNet's training loss}. By prioritizing trajectories where the main model exhibits \textbf{high loss}, LGGFN focuses sampling on poorly understood regions of the state space. This targeted exploration significantly accelerates the discovery of diverse, high-reward samples. Empirically, across \textbf{diverse benchmarks} including grid environments, structured sequence generation, Bayesian structure learning, and biological sequence design, LGGFN consistently \textbf{outperforms} baselines in exploration efficiency and sample diversity. For instance, on a challenging sequence generation task, it discovered over 40 times more unique valid modes while simultaneously reducing the exploration error metric by approximately 99\%.

Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets

TL;DR

This work tackles mode collapse in Generative Flow Networks (GFlowNets) by introducing Loss-Guided GFlowNets (LGGFN), where an auxiliary GFlowNet is steered by the main model's training loss via the augmented reward . The main idea is to bias exploration toward high-loss, underexplored regions while maintaining stability through a mixture of on-policy training and auxiliary-sampled trajectories. The approach yields improved exploration efficiency and diversity across diverse benchmarks (Hypergrid, bit sequences, Bayesian structure learning, and mRNA design), including a >40x increase in unique modes and ~99% reduction in exploration error in challenging tasks. The paper also provides theoretical intuition (implicit curriculum and gradient variance reduction) and discusses robustness and potential extensions, highlighting LGGFN as a simple, generalizable enhancement for GFlowNets with practical impact in scientific design tasks. Overall, LGGFN offers a principled, scalable method to mitigate mode collapse and accelerate discovery in complex, sparse-reward domains.

Abstract

Although Generative Flow Networks (GFlowNets) are designed to capture multiple modes of a reward function, they often suffer from mode collapse in practice, getting trapped in early-discovered modes and requiring prolonged training to find diverse solutions. Existing exploration techniques often rely on heuristic novelty signals. We propose Loss-Guided GFlowNets (LGGFN), a novel approach where an auxiliary GFlowNet's exploration is \textbf{directly driven by the main GFlowNet's training loss}. By prioritizing trajectories where the main model exhibits \textbf{high loss}, LGGFN focuses sampling on poorly understood regions of the state space. This targeted exploration significantly accelerates the discovery of diverse, high-reward samples. Empirically, across \textbf{diverse benchmarks} including grid environments, structured sequence generation, Bayesian structure learning, and biological sequence design, LGGFN consistently \textbf{outperforms} baselines in exploration efficiency and sample diversity. For instance, on a challenging sequence generation task, it discovered over 40 times more unique valid modes while simultaneously reducing the exploration error metric by approximately 99\%.

Paper Structure

This paper contains 32 sections, 13 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Directed chain of $N+1$ states. Extremal nodes $s_0$ and $s_N$ have high reward, while all intermediate states have low reward.
  • Figure 2: Plot of the evolution of $P_F(\texttt{exit} \mid s_0)$ for different algorithms as a function of number of sampled trajectories. The chain has 100 states and uses the reward setup $R(s_0) = R(s_N) = 101$ and $R(s_i) = 1$. At convergence, $P_F(\texttt{exit} \mid s_0) = \frac{101}{300} \approx 0.337$.
  • Figure 3: L1-loss during training for different sizes of hypergrid as a function of training iterations.
  • Figure 4: L1-loss during training for different values of $R_0$ and a fixed size of 128x128. as a function of training iterations.
  • Figure 5: L1-loss evolution over training iterations for different $\lambda$ values and grid sizes.
  • ...and 7 more figures