QGFN: Controllable Greediness with Action Values

Elaine Lau; Stephen Zhewen Lu; Ling Pan; Doina Precup; Emmanuel Bengio

QGFN: Controllable Greediness with Action Values

Elaine Lau, Stephen Zhewen Lu, Ling Pan, Doina Precup, Emmanuel Bengio

TL;DR

This work proposes to combine the GFN policy with an action-value estimate, Q, to create greedier sampling policies which can be controlled by a mixing parameter and shows that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.

Abstract

Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.

QGFN: Controllable Greediness with Action Values

TL;DR

Abstract

, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.

Paper Structure (29 sections, 7 equations, 18 figures, 12 tables)

This paper contains 29 sections, 7 equations, 18 figures, 12 tables.

Introduction
Background and Related Work
Related Work
Motivation
QGFN: controllable greediness through $Q$
Main results
Analysis of results
Method Analysis
Conclusion and Discussion
Analysing $p$-greedy
Additional experiments and analyses
Using $Q$ from different behavior policies
Trying other QGFN variants at inference
QGFN variants with different objective
Exploring weight sharing in QGFN
...and 14 more sections

Figures (18)

Figure 1: Solely relying on flow functions $F$ in GFNs can be insufficient. While GFNs capture how much stuff there is, they spend time sampling from lots of small rewards.
Figure 2: Fragment-based molecule task. Left: Average rewards over the training trajectories. Center: Number of unique modes with a reward threshold exceeding 0.97 and pairwise Tanimoto similarity score less than 0.65. Right: Average pairwise Tanimoto similarity score for the top 1000 molecules sampled by reward. Lines are the interquartile mean and standard error calculated over 5 seeds.
Figure 3: QM9 task. Left: Average rewards over training trajectories. Right: Number of modes with a reward above 1.10 and pairwise Tanimoto similarity less than 0.70.
Figure 4: RNA-binding tasks, Average reward and modes. Left: L14RNA1 task. Right: L14RNA1+2 task, based on 5 seeds (interquartile mean and standard error shown).
Figure 5: Bit sequence task, $k=1$. Interquartile mean and standard error over 5 seeds.
...and 13 more figures

QGFN: Controllable Greediness with Action Values

TL;DR

Abstract

QGFN: Controllable Greediness with Action Values

Authors

TL;DR

Abstract

Table of Contents

Figures (18)