On Generalization for Generative Flow Networks
Anas Krichel, Nikolay Malkin, Salem Lahlou, Yoshua Bengio
TL;DR
This work formalizes generalization in Generative Flow Networks (GFlowNets) trained via the Trajectory Balance loss, linking trajectory probabilities to an unnormalized reward $R$ through $P_F$, $P_B$, and $Z$. It introduces a stability-focused perspective and establishes a bound showing small reward perturbations yield controlled changes in trajectory distributions under TB, with a concrete result for uniform $P_B$. The authors provide empirical evidence by hiding parts of the reward and comparing TB, DB, and FL-DB losses, finding that DB often generalizes better and that access to intermediate rewards (FL-DB) boosts generalization when available. Overall, the paper contributes theoretical generalization and stability frameworks for GFlowNets and demonstrates practical implications for designing robust training losses and policies across unseen regions of the reward landscape.
Abstract
Generative Flow Networks (GFlowNets) have emerged as an innovative learning paradigm designed to address the challenge of sampling from an unnormalized probability distribution, called the reward function. This framework learns a policy on a constructed graph, which enables sampling from an approximation of the target probability distribution through successive steps of sampling from the learned policy. To achieve this, GFlowNets can be trained with various objectives, each of which can lead to the model s ultimate goal. The aspirational strength of GFlowNets lies in their potential to discern intricate patterns within the reward function and their capacity to generalize effectively to novel, unseen parts of the reward function. This paper attempts to formalize generalization in the context of GFlowNets, to link generalization with stability, and also to design experiments that assess the capacity of these models to uncover unseen parts of the reward function. The experiments will focus on length generalization meaning generalization to states that can be constructed only by longer trajectories than those seen in training.
