Investigating Generalization Behaviours of Generative Flow Networks
Lazar Atanackovic, Emmanuel Bengio
TL;DR
This work addresses how Generative Flow Networks generalize to unseen regions of discrete state spaces by introducing a graph-based benchmark with tractable $p(x)$ and controllable reward difficulty. It employs three experimental avenues—distilling true flows, memorization-gap analyses, and offline/off-policy training regimes—to disentangle the mechanisms behind generalization. The findings show that the learned flow functions encode structure that supports generalization, while offline/off-policy training can impede accurate $p(x)$ estimation, though reward proxies remain informative; reward shaping is generally robust within tested ranges. The results underscore the importance of environment structure and careful benchmarking when studying GFNs, and they point to theoretical and methodological directions for improving online and on-policy generalization in larger, real-world discrete spaces.
Abstract
Generative Flow Networks (GFlowNets, GFNs) are a generative framework for learning unnormalized probability mass functions over discrete spaces. Since their inception, GFlowNets have proven to be useful for learning generative models in applications where the majority of the discrete space is unvisited during training. This has inspired some to hypothesize that GFlowNets, when paired with deep neural networks (DNNs), have favorable generalization properties. In this work, we empirically verify some of the hypothesized mechanisms of generalization of GFlowNets. We accomplish this by introducing a novel graph-based benchmark environment where reward difficulty can be easily varied, $p(x)$ can be computed exactly, and an unseen test set can be constructed to quantify generalization performance. Using this graph-based environment, we are able to systematically test the hypothesized mechanisms of generalization of GFlowNets and put forth a set of empirical observations that summarize our findings. In particular, we find (and confirm) that the functions that GFlowNets learn to approximate have an implicit underlying structure which facilitate generalization. Surprisingly -- and somewhat contradictory to existing knowledge -- we also find that GFlowNets are sensitive to being trained offline and off-policy. However, the reward implicitly learned by GFlowNets is robust to changes in the training distribution.
