Table of Contents
Fetching ...

Investigating Generalization Behaviours of Generative Flow Networks

Lazar Atanackovic, Emmanuel Bengio

TL;DR

This work addresses how Generative Flow Networks generalize to unseen regions of discrete state spaces by introducing a graph-based benchmark with tractable $p(x)$ and controllable reward difficulty. It employs three experimental avenues—distilling true flows, memorization-gap analyses, and offline/off-policy training regimes—to disentangle the mechanisms behind generalization. The findings show that the learned flow functions encode structure that supports generalization, while offline/off-policy training can impede accurate $p(x)$ estimation, though reward proxies remain informative; reward shaping is generally robust within tested ranges. The results underscore the importance of environment structure and careful benchmarking when studying GFNs, and they point to theoretical and methodological directions for improving online and on-policy generalization in larger, real-world discrete spaces.

Abstract

Generative Flow Networks (GFlowNets, GFNs) are a generative framework for learning unnormalized probability mass functions over discrete spaces. Since their inception, GFlowNets have proven to be useful for learning generative models in applications where the majority of the discrete space is unvisited during training. This has inspired some to hypothesize that GFlowNets, when paired with deep neural networks (DNNs), have favorable generalization properties. In this work, we empirically verify some of the hypothesized mechanisms of generalization of GFlowNets. We accomplish this by introducing a novel graph-based benchmark environment where reward difficulty can be easily varied, $p(x)$ can be computed exactly, and an unseen test set can be constructed to quantify generalization performance. Using this graph-based environment, we are able to systematically test the hypothesized mechanisms of generalization of GFlowNets and put forth a set of empirical observations that summarize our findings. In particular, we find (and confirm) that the functions that GFlowNets learn to approximate have an implicit underlying structure which facilitate generalization. Surprisingly -- and somewhat contradictory to existing knowledge -- we also find that GFlowNets are sensitive to being trained offline and off-policy. However, the reward implicitly learned by GFlowNets is robust to changes in the training distribution.

Investigating Generalization Behaviours of Generative Flow Networks

TL;DR

This work addresses how Generative Flow Networks generalize to unseen regions of discrete state spaces by introducing a graph-based benchmark with tractable and controllable reward difficulty. It employs three experimental avenues—distilling true flows, memorization-gap analyses, and offline/off-policy training regimes—to disentangle the mechanisms behind generalization. The findings show that the learned flow functions encode structure that supports generalization, while offline/off-policy training can impede accurate estimation, though reward proxies remain informative; reward shaping is generally robust within tested ranges. The results underscore the importance of environment structure and careful benchmarking when studying GFNs, and they point to theoretical and methodological directions for improving online and on-policy generalization in larger, real-world discrete spaces.

Abstract

Generative Flow Networks (GFlowNets, GFNs) are a generative framework for learning unnormalized probability mass functions over discrete spaces. Since their inception, GFlowNets have proven to be useful for learning generative models in applications where the majority of the discrete space is unvisited during training. This has inspired some to hypothesize that GFlowNets, when paired with deep neural networks (DNNs), have favorable generalization properties. In this work, we empirically verify some of the hypothesized mechanisms of generalization of GFlowNets. We accomplish this by introducing a novel graph-based benchmark environment where reward difficulty can be easily varied, can be computed exactly, and an unseen test set can be constructed to quantify generalization performance. Using this graph-based environment, we are able to systematically test the hypothesized mechanisms of generalization of GFlowNets and put forth a set of empirical observations that summarize our findings. In particular, we find (and confirm) that the functions that GFlowNets learn to approximate have an implicit underlying structure which facilitate generalization. Surprisingly -- and somewhat contradictory to existing knowledge -- we also find that GFlowNets are sensitive to being trained offline and off-policy. However, the reward implicitly learned by GFlowNets is robust to changes in the training distribution.
Paper Structure (53 sections, 3 equations, 29 figures, 5 tables)

This paper contains 53 sections, 3 equations, 29 figures, 5 tables.

Figures (29)

  • Figure 1: (a) Training a GNN on 3 different tasks with models of varying capacity. Most of the variance comes from varying capacity. Dashed lines are for the highest capacity models. (b) Training a GFlowNet (online and on-policy) on 4 different tasks. While ordering is mostly preserved, apparent difficulty depends on the choice of metric.
  • Figure 2: (a) Training a model to distill edge flows and policies. (i) doing so recovers the intended distribution, (ii) modeling $P_F$ appears easier than modeling $F$. (b) JS-divergence and MAE gaps between SubTB(1) trained model and $P_F$ distilled model. Distilling to $P_F$ appears to yield lower distributional error, except for JS-divergence on the cliques task.
  • Figure 3: (a) Training models distilled to $P_F$ and models trained online/on-policy for a range of $\gamma$; (b) similarly for a range of $\beta$. Transforming the distribution of the reward does not significantly affect the generalization difficulty in approximating $p(x)$.
  • Figure 4: Memorization gap training curves for counting, neighbors, and cliques tasks. Maintaining flow structure in the learning problem (learning $P_F$ under shuffled $\Tilde{R}$) induces a smaller memorization gap relative to the fully de-structured setting. See Table \ref{['tab:memorize']} for reference of experimental setup.
  • Figure 5: Evaluation curves for offline and off-policy trained GFlowNets on the neighbors task for different choices of ${\mathbb{P}}_{{\mathcal{X}}}$. (a) When training using the full dataset (no test set). (b) When training using a subset of the full dataset (90%-10% train-test split). Complete experiments for all graph generation tasks and evaluation metrics are shown in §\ref{['ap:full_offline_expts']}.
  • ...and 24 more figures