Table of Contents
Fetching ...

Bifurcated Generative Flow Networks

Chunhui Li, Cheng-Hao Liu, Dianbo Liu, Qingpeng Cai, Ling Pan

TL;DR

BN introduces a state-flow and edge-allocation decomposition for GFlowNets, replacing direct edge-flow parameterization with a bifurcated architecture that yields a more data-efficient learning signal. The method defines $F(s \rightarrow s')=F(s) A(s'|s)$ and optimizes $\mathcal{L}_{BN}(s')$, ensuring sampling proportional to rewards when minimized. Across HyperGrid, RNA sequence design, and molecule generation benchmarks, BN outperforms strong baselines—especially those relying on backward policies—demonstrating faster convergence, more accurate reward-proportional sampling, and greater diversity in high-reward samples. This work highlights a practical pathway to scale GFlowNets to large state-action spaces and motivates future exploration in complex design tasks and optimization problems.

Abstract

Generative Flow Networks (GFlowNets), a new family of probabilistic samplers, have recently emerged as a promising framework for learning stochastic policies that generate high-quality and diverse objects proportionally to their rewards. However, existing GFlowNets often suffer from low data efficiency due to the direct parameterization of edge flows or reliance on backward policies that may struggle to scale up to large action spaces. In this paper, we introduce Bifurcated GFlowNets (BN), a novel approach that employs a bifurcated architecture to factorize the flows into separate representations for state flows and edge-based flow allocation. This factorization enables BN to learn more efficiently from data and better handle large-scale problems while maintaining the convergence guarantee. Through extensive experiments on standard evaluation benchmarks, we demonstrate that BN significantly improves learning efficiency and effectiveness compared to strong baselines.

Bifurcated Generative Flow Networks

TL;DR

BN introduces a state-flow and edge-allocation decomposition for GFlowNets, replacing direct edge-flow parameterization with a bifurcated architecture that yields a more data-efficient learning signal. The method defines and optimizes , ensuring sampling proportional to rewards when minimized. Across HyperGrid, RNA sequence design, and molecule generation benchmarks, BN outperforms strong baselines—especially those relying on backward policies—demonstrating faster convergence, more accurate reward-proportional sampling, and greater diversity in high-reward samples. This work highlights a practical pathway to scale GFlowNets to large state-action spaces and motivates future exploration in complex design tasks and optimization problems.

Abstract

Generative Flow Networks (GFlowNets), a new family of probabilistic samplers, have recently emerged as a promising framework for learning stochastic policies that generate high-quality and diverse objects proportionally to their rewards. However, existing GFlowNets often suffer from low data efficiency due to the direct parameterization of edge flows or reliance on backward policies that may struggle to scale up to large action spaces. In this paper, we introduce Bifurcated GFlowNets (BN), a novel approach that employs a bifurcated architecture to factorize the flows into separate representations for state flows and edge-based flow allocation. This factorization enables BN to learn more efficiently from data and better handle large-scale problems while maintaining the convergence guarantee. Through extensive experiments on standard evaluation benchmarks, we demonstrate that BN significantly improves learning efficiency and effectiveness compared to strong baselines.
Paper Structure (28 sections, 1 theorem, 9 equations, 11 figures, 1 table)

This paper contains 28 sections, 1 theorem, 9 equations, 11 figures, 1 table.

Key Result

Theorem 4.1

If $\mathcal{L}_{\text{BN}}(s')=0$ for all states, then the edge advantage policy $A(s'|s)$ samples proportionally to the reward function.

Figures (11)

  • Figure 1: An illustrative example in $\beta$-lactam antibiotics. Here, we highlight a key action (red) that forms the crucial $\beta$-lactam ring which is essential for antibacterial properties, versus a secondary action (yellow) that modulates the side chain to refine specific properties without drastically altering fundamental activity.
  • Figure 2: A simple scenario demonstrating the data inefficiency problem.
  • Figure 3: The network structure of BN.
  • Figure 4: Results in the didactic task.
  • Figure 5: The HyperGrid task.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem 4.1
  • proof