Table of Contents
Fetching ...

QGFN: Controllable Greediness with Action Values

Elaine Lau, Stephen Zhewen Lu, Ling Pan, Doina Precup, Emmanuel Bengio

TL;DR

This work proposes to combine the GFN policy with an action-value estimate, Q, to create greedier sampling policies which can be controlled by a mixing parameter and shows that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.

Abstract

Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.

QGFN: Controllable Greediness with Action Values

TL;DR

This work proposes to combine the GFN policy with an action-value estimate, Q, to create greedier sampling policies which can be controlled by a mixing parameter and shows that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.

Abstract

Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, , to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.
Paper Structure (29 sections, 7 equations, 18 figures, 12 tables)

This paper contains 29 sections, 7 equations, 18 figures, 12 tables.

Figures (18)

  • Figure 1: Solely relying on flow functions $F$ in GFNs can be insufficient. While GFNs capture how much stuff there is, they spend time sampling from lots of small rewards.
  • Figure 2: Fragment-based molecule task. Left: Average rewards over the training trajectories. Center: Number of unique modes with a reward threshold exceeding 0.97 and pairwise Tanimoto similarity score less than 0.65. Right: Average pairwise Tanimoto similarity score for the top 1000 molecules sampled by reward. Lines are the interquartile mean and standard error calculated over 5 seeds.
  • Figure 3: QM9 task. Left: Average rewards over training trajectories. Right: Number of modes with a reward above 1.10 and pairwise Tanimoto similarity less than 0.70.
  • Figure 4: RNA-binding tasks, Average reward and modes. Left: L14RNA1 task. Right: L14RNA1+2 task, based on 5 seeds (interquartile mean and standard error shown).
  • Figure 5: Bit sequence task, $k=1$. Interquartile mean and standard error over 5 seeds.
  • ...and 13 more figures