Table of Contents
Fetching ...

A Theory of Non-Acyclic Generative Flow Networks

Leo Maxime Brunswic, Yinchuan Li, Yushun Xu, Shangling Jui, Lizhuang Ma

TL;DR

This work generalizes Generative Flows (GFlowNets) from DAGs to measurable spaces, addressing cycles and continuous-state challenges by introducing a measure-theoretic framework and a notion of 0-flows. It shows that conventional training losses can become unstable in non-acyclic settings and develops a family of stable losses with stabilizing regularization, framed as generalized divergences over state/edge/path measures. Theoretical results characterize stability conditions and sampling behavior, while experiments on hypergrids, Cayley graphs, and continuous control tasks validate that stable losses prevent runaway flows and improve convergence to reward-proportional distributions. The approach broadens the applicability of GFlowNets, offering practical guidance for training in spaces with cycles and continuous components, with implications for MCMC-like sampling and policy-based generation.

Abstract

GFlowNets is a novel flow-based method for learning a stochastic policy to generate objects via a sequence of actions and with probability proportional to a given positive reward. We contribute to relaxing hypotheses limiting the application range of GFlowNets, in particular: acyclicity (or lack thereof). To this end, we extend the theory of GFlowNets on measurable spaces which includes continuous state spaces without cycle restrictions, and provide a generalization of cycles in this generalized context. We show that losses used so far push flows to get stuck into cycles and we define a family of losses solving this issue. Experiments on graphs and continuous tasks validate those principles.

A Theory of Non-Acyclic Generative Flow Networks

TL;DR

This work generalizes Generative Flows (GFlowNets) from DAGs to measurable spaces, addressing cycles and continuous-state challenges by introducing a measure-theoretic framework and a notion of 0-flows. It shows that conventional training losses can become unstable in non-acyclic settings and develops a family of stable losses with stabilizing regularization, framed as generalized divergences over state/edge/path measures. Theoretical results characterize stability conditions and sampling behavior, while experiments on hypergrids, Cayley graphs, and continuous control tasks validate that stable losses prevent runaway flows and improve convergence to reward-proportional distributions. The approach broadens the applicability of GFlowNets, offering practical guidance for training in spaces with cycles and continuous components, with implications for MCMC-like sampling and policy-based generation.

Abstract

GFlowNets is a novel flow-based method for learning a stochastic policy to generate objects via a sequence of actions and with probability proportional to a given positive reward. We contribute to relaxing hypotheses limiting the application range of GFlowNets, in particular: acyclicity (or lack thereof). To this end, we extend the theory of GFlowNets on measurable spaces which includes continuous state spaces without cycle restrictions, and provide a generalization of cycles in this generalized context. We show that losses used so far push flows to get stuck into cycles and we define a family of losses solving this issue. Experiments on graphs and continuous tasks validate those principles.
Paper Structure (38 sections, 26 theorems, 69 equations, 7 figures, 1 table)

This paper contains 38 sections, 26 theorems, 69 equations, 7 figures, 1 table.

Key Result

Lemma 1

A sufficient condition for stability is that for all $0$-flow $F_0$ which is a subflow of each $F_i$.

Figures (7)

  • Figure 1: Paths generated on a grid using: Left: non-acyclic loss (ours); Right: FM loss (bengio2021flow).
  • Figure 2: A depiction of an $R$-edgeflow on a measurable space. The initial flow $F({s_0} \rightarrow \cdot)$ is in blue and the Reward $R$ is in green. A possible 0-flow is highlighted in red.
  • Figure 3: GFlowNets were trained with various unstable and stable losses on the 2DW20 grid: $\mathcal{L}_{FM, stable} = \mathcal{L}_{FM,\Delta,f,0,\nu}$ with $f(x)=x^2$; $\mathcal{L}_{FM,\chi^2,\nu}$ and $\mathcal{L}_{FM, TV,\nu}$ are instances of $\mathcal{L}_{FM,f,\nu}$ as above with $f(x)=(1-x)^2$ and $f(x)=|1-x|$ respectively. Top: evolution of averaged reward during training. Bottom: evolution of average lengths of sampled paths.
  • Figure 4: Comparison results of Stable-GFlowNets (ours), GFlowNets, and a Metropolis-Hasting baseline on the Cayley graph of $\mathfrak S_{20}$ generated by a transposition, a $20$-cycle and its inverse. The reward is $R_1$ as above with $k=1$ and $c=20$. Paths are drawn with a cut-off length of 80. An optimal strategy yields an expected reward of 20 with an expected length of 5. Top: evolution of averaged reward under training. Bottom: Average lengths of sampled paths.
  • Figure 5: Comparison results of Stable-CFlowNets (ours), CFlowNets, DDPG, TD3, SAC and PPO on Point-Robot-Sparse. Left: Number of valid-distinctive trajectories generated under 5000 explorations. Right: The average reward of different methods.
  • ...and 2 more figures

Theorems & Definitions (53)

  • Definition 1
  • Definition 2
  • Definition 3: Stability
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Theorem 3: Instability of divergence-based losses
  • Theorem 4
  • Example 1: Stable FM loss
  • Example 2: Stable DB loss
  • ...and 43 more