Improving GFlowNets with Monte Carlo Tree Search

Nikita Morozov; Daniil Tiapkin; Sergey Samsonov; Alexey Naumov; Dmitry Vetrov

Improving GFlowNets with Monte Carlo Tree Search

Nikita Morozov, Daniil Tiapkin, Sergey Samsonov, Alexey Naumov, Dmitry Vetrov

TL;DR

The paper addresses enhancing planning in Generative Flow Networks (GFlowNets) by integrating Monte Carlo Tree Search (MCTS) through the MENTS algorithm to estimate entropy-regularized Q-values. By applying MENTS on top of SoftDQN, the authors enable look-ahead planning during both training and inference, aligning forward policies with the trajectory balance framework. Empirically, MENTS improves sample efficiency and generation fidelity in the Hypergrid and Bit Sequence tasks, with gains scaling with the number of MCTS rounds and when used for both training and inference. This approach leverages the DAG structure and soft RL formulation of GFlowNets to provide a principled planning mechanism that can be extended to other GFlowNet variants and domains.

Abstract

Generative Flow Networks (GFlowNets) treat sampling from distributions over compositional discrete spaces as a sequential decision-making problem, training a stochastic policy to construct objects step by step. Recent studies have revealed strong connections between GFlowNets and entropy-regularized reinforcement learning. Building on these insights, we propose to enhance planning capabilities of GFlowNets by applying Monte Carlo Tree Search (MCTS). Specifically, we show how the MENTS algorithm (Xiao et al., 2019) can be adapted for GFlowNets and used during both training and inference. Our experiments demonstrate that this approach improves the sample efficiency of GFlowNet training and the generation fidelity of pre-trained GFlowNet models.

Improving GFlowNets with Monte Carlo Tree Search

TL;DR

Abstract

Paper Structure (15 sections, 18 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 18 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Background
GFlowNets
GFlowNets as Soft RL
Method
MENTS for GFlowNets
Experiments
Hypergrid Environment
Bit Sequence Generation
Conclusion
Algorithm Details
Connection to GFlowNet State and Edge Flows
Experimental Details
Hypergrid
Bit Sequences

Figures (2)

Figure 1: $L^1$ distance between target and empirical sample distributions over the course of training on the hypergrid environment. Numbers next to MENTS in the legend correspond to maximum number of MCTS rounds $N(s_{\mathrm{root}})$.
Figure 2: Spearman correlation between $R$ and $P_{\theta}$ on a test set for varying $n$ and $k$ in the bit sequence generation task. MENTS is used here only at the inference stage.

Improving GFlowNets with Monte Carlo Tree Search

TL;DR

Abstract

Improving GFlowNets with Monte Carlo Tree Search

Authors

TL;DR

Abstract

Table of Contents

Figures (2)