Loss-Guided Auxiliary Agents for Overcoming Mode Collapse in GFlowNets
Idriss Malek, Aya Laajil, Abhijith Sharma, Eric Moulines, Salem Lahlou
TL;DR
This work tackles mode collapse in Generative Flow Networks (GFlowNets) by introducing Loss-Guided GFlowNets (LGGFN), where an auxiliary GFlowNet is steered by the main model's training loss via the augmented reward $R_{aux} = R_{main} + \lambda \mathcal{L}_{main}$. The main idea is to bias exploration toward high-loss, underexplored regions while maintaining stability through a mixture of on-policy training and auxiliary-sampled trajectories. The approach yields improved exploration efficiency and diversity across diverse benchmarks (Hypergrid, bit sequences, Bayesian structure learning, and mRNA design), including a >40x increase in unique modes and ~99% reduction in exploration error in challenging tasks. The paper also provides theoretical intuition (implicit curriculum and gradient variance reduction) and discusses robustness and potential extensions, highlighting LGGFN as a simple, generalizable enhancement for GFlowNets with practical impact in scientific design tasks. Overall, LGGFN offers a principled, scalable method to mitigate mode collapse and accelerate discovery in complex, sparse-reward domains.
Abstract
Although Generative Flow Networks (GFlowNets) are designed to capture multiple modes of a reward function, they often suffer from mode collapse in practice, getting trapped in early-discovered modes and requiring prolonged training to find diverse solutions. Existing exploration techniques often rely on heuristic novelty signals. We propose Loss-Guided GFlowNets (LGGFN), a novel approach where an auxiliary GFlowNet's exploration is \textbf{directly driven by the main GFlowNet's training loss}. By prioritizing trajectories where the main model exhibits \textbf{high loss}, LGGFN focuses sampling on poorly understood regions of the state space. This targeted exploration significantly accelerates the discovery of diverse, high-reward samples. Empirically, across \textbf{diverse benchmarks} including grid environments, structured sequence generation, Bayesian structure learning, and biological sequence design, LGGFN consistently \textbf{outperforms} baselines in exploration efficiency and sample diversity. For instance, on a challenging sequence generation task, it discovered over 40 times more unique valid modes while simultaneously reducing the exploration error metric by approximately 99\%.
