Maximum entropy GFlowNets with soft Q-learning
Sobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger, Pierre-Luc Bacon
TL;DR
This work builds a bridge between entropy-regularized reinforcement learning and Generative Flow Networks by designing a reward that yields sampling proportional to an unnormalized target $ ilde{p}$ under the soft Bellman equations. It introduces generative soft Q-learning (GSQL) and the maximum entropy GFN (max-ent GFN), where the backward policy is $q(s,a|s')=\frac{n(s)}{n(s')}$ and entropy is maximized over feasible flows, guaranteeing the maximum achievable flow entropy in general. The authors show that $\log n$ can be learned via the inverted MDP and that PCL and trajectory/balance constraints align under this framework, yielding a unique, high-entropy solution. Empirically, max-ent GFNs improve exploration and mode coverage on structured MDPs, including tree- and graph-building tasks like sEH and QM9, while GSQL may fail on larger combinatorial spaces. The results highlight the practical viability of leveraging entropy-regularized RL tools for GFNs and point to broad applicability in combinatorial sampling and molecule design.
Abstract
Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a scalable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. This construction allows us to introduce maximum entropy GFNs, which, in contrast to GFNs with uniform backward policy, achieve the maximum entropy attainable by GFNs without constraints on the state space.
