Table of Contents
Fetching ...

Exploring Multiple High-Scoring Subspaces in Generative Flow Networks

Xuan Yu, Xu Wang, Rui Zhu, Yudong Zhang, Yang Wang

TL;DR

This work tackles the challenge of inefficient exploration in Generative Flow Networks (GFlowNets) by reframing exploration as subspace-level optimization. It introduces CMAB-GFN, a framework that uses combinatorial multi-armed bandits (CMAB) to prune action spaces into compact, high-reward subspaces and trains GFlowNets within those subspaces, while periodically evaluating across all subspaces to preserve diversity. The method employs a two-phase sampling protocol, a CUCB-based subspace selection with a co-occurrence aware scoring mechanism, and architectural adjustments to mitigate deep networks. Across bit sequence, molecule design, and RNA-binding tasks, CMAB-GFN yields higher-reward candidates, discovers more high-reward modes, and maintains diversity better than strong baselines, with ablation analyses confirming robustness to hyperparameter choices. The approach enhances efficiency and robustness of GFlowNets in structured generative domains and offers scalable, subspace-aware exploration for complex combinatorial design problems.

Abstract

As a probabilistic sampling framework, Generative Flow Networks (GFlowNets) show strong potential for constructing complex combinatorial objects through the sequential composition of elementary components. However, existing GFlowNets often suffer from excessive exploration over vast state spaces, leading to over-sampling of low-reward regions and convergence to suboptimal distributions. Effectively biasing GFlowNets toward high-reward solutions remains a non-trivial challenge. In this paper, we propose CMAB-GFN, which integrates a combinatorial multi-armed bandit (CMAB) framework with GFlowNet policies. The CMAB component prunes low-quality actions, yielding compact high-scoring subspaces for exploration. Restricting GFNs to these compact high-scoring subspaces accelerates the discovery of high-value candidates, while the exploration of different subspaces ensures that diversity is not sacrificed. Experimental results on multiple tasks demonstrate that CMAB-GFN generates higher-reward candidates than existing approaches.

Exploring Multiple High-Scoring Subspaces in Generative Flow Networks

TL;DR

This work tackles the challenge of inefficient exploration in Generative Flow Networks (GFlowNets) by reframing exploration as subspace-level optimization. It introduces CMAB-GFN, a framework that uses combinatorial multi-armed bandits (CMAB) to prune action spaces into compact, high-reward subspaces and trains GFlowNets within those subspaces, while periodically evaluating across all subspaces to preserve diversity. The method employs a two-phase sampling protocol, a CUCB-based subspace selection with a co-occurrence aware scoring mechanism, and architectural adjustments to mitigate deep networks. Across bit sequence, molecule design, and RNA-binding tasks, CMAB-GFN yields higher-reward candidates, discovers more high-reward modes, and maintains diversity better than strong baselines, with ablation analyses confirming robustness to hyperparameter choices. The approach enhances efficiency and robustness of GFlowNets in structured generative domains and offers scalable, subspace-aware exploration for complex combinatorial design problems.

Abstract

As a probabilistic sampling framework, Generative Flow Networks (GFlowNets) show strong potential for constructing complex combinatorial objects through the sequential composition of elementary components. However, existing GFlowNets often suffer from excessive exploration over vast state spaces, leading to over-sampling of low-reward regions and convergence to suboptimal distributions. Effectively biasing GFlowNets toward high-reward solutions remains a non-trivial challenge. In this paper, we propose CMAB-GFN, which integrates a combinatorial multi-armed bandit (CMAB) framework with GFlowNet policies. The CMAB component prunes low-quality actions, yielding compact high-scoring subspaces for exploration. Restricting GFNs to these compact high-scoring subspaces accelerates the discovery of high-value candidates, while the exploration of different subspaces ensures that diversity is not sacrificed. Experimental results on multiple tasks demonstrate that CMAB-GFN generates higher-reward candidates than existing approaches.
Paper Structure (38 sections, 2 theorems, 28 equations, 12 figures, 10 tables, 1 algorithm)

This paper contains 38 sections, 2 theorems, 28 equations, 12 figures, 10 tables, 1 algorithm.

Key Result

Lemma 2.2

Assume the environment induces a finite DAG with initial state $s_0$ and terminal set $\mathcal{X}$, and let $P_F(\cdot|\cdot;\theta)$ and $P_B(\cdot|\cdot;\theta)$ be valid forward/backward transition kernels with full support. If the TB constraints hold for all trajectories $\tau=(s_0\rightarrow\c equivalently $\mathcal{L}_{\text{TB}}(\theta)=0$, then the induced terminal distribution satisfies

Figures (12)

  • Figure 1: Illustration of action pruning.The triangle means initial state, the circles denote interior states, and the squares denote the terminal states. By pruning low-scoring actions (blue edges), candidates with low rewards(blue nodes) are masked. Candidates with high rewards (Orange ones) are more likely to be explored, addressing the over-exploration of low-reward candidates.
  • Figure 2: Using short action sequences as arms to transform the narrow-deep network architecture into a more balanced wide structure.
  • Figure 3: Experimental results on Bit Sequence task. Left panel shows how different action values change as the training progresses, where action values refer to the empirical mean rewards for each base arm, corresponding to $\hat{\mu_i}$ in Eq. \ref{['eq-4']}. Right panel shows the mode discovered by different methods.
  • Figure 4: The curve of number of modes varying with rounds ($10^3$) on Molecule Design. Left panel shows the number of modes discovered with a reward $R>7.5$. Right panel shows the number of modes discovered with a reward $R>8$.
  • Figure 5: Performance comparison on RNA design tasks. Rows correspond to RNA-1, RNA-2, and RNA-3, respectively.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Remark 2.1: Scope of the theoretical claims
  • Lemma 2.2: TB Consistency (Zero-loss Implies Target Distribution)
  • proof
  • Lemma 2.3: Convergence of Sampling Distribution
  • proof