Table of Contents
Fetching ...

Distributional GFlowNets with Quantile Flows

Dinghuai Zhang, Ling Pan, Ricky T. Q. Chen, Aaron Courville, Yoshua Bengio

TL;DR

This work identifies a limitation of standard GFlowNets in handling stochastic rewards and uncertainty. It introduces Distributional GFlowNets by modeling edge/state flows as distributions and parameterizing their quantile functions with Quantile Matching (QM), enabling risk-sensitive policies via distortion risk measures. QM provides a principled training objective that improves signal quality and generalization, yielding superior performance on deterministic benchmarks and robust behavior in stochastic settings such as risky hypergrid tasks, sequence generation, and molecule optimization. The approach broadens the applicability of GFlowNets to real-world, uncertainty-prone domains, with potential impact on areas like drug discovery and complex combinatorial generation.

Abstract

Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.

Distributional GFlowNets with Quantile Flows

TL;DR

This work identifies a limitation of standard GFlowNets in handling stochastic rewards and uncertainty. It introduces Distributional GFlowNets by modeling edge/state flows as distributions and parameterizing their quantile functions with Quantile Matching (QM), enabling risk-sensitive policies via distortion risk measures. QM provides a principled training objective that improves signal quality and generalization, yielding superior performance on deterministic benchmarks and robust behavior in stochastic settings such as risky hypergrid tasks, sequence generation, and molecule optimization. The approach broadens the applicability of GFlowNets to real-world, uncertainty-prone domains, with potential impact on areas like drug discovery and complex combinatorial generation.

Abstract

Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.
Paper Structure (32 sections, 3 theorems, 22 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 3 theorems, 22 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Consider the reward $R(\mathbf{x})$ for object $\mathbf{x}$ to be a stochastic random variable, then given sufficiently large capacity and computation resource, the obtained GFlowNet after training would generate objects with probability proportional to $\exp\left(\mathbb{E}[\log R(\mathbf{x})]\righ

Figures (10)

  • Figure 1: Illustration of a distributional GFlowNet with stochastic edge flows.
  • Figure 2: A risky hypergrid environment.
  • Figure 3: Experiment results on stochastic risky hypergrid problems with different risk-sensitive policies. Up: CVaR$(0.1)$ and Wang$(-0.75)$ induce risk-averse policies, thus achieving smaller violation rates. Bottom: Risk-sensitive methods achieve similar performance with other baselines with regard to the number of non-risky modes captured, indicating that the proposed conservative method do not hurt the standard performance.
  • Figure 4: Experiment results on the hypergrid tasks for different scale levels. Up: the $\ell_1$ error between the learned distribution density and the true target density. Bottom: the number of discovered modes across the training process. The proposed quantile matching algorithm achieves the best results across different hypergrid scales under both quantitative metrics.
  • Figure 5: The number of modes reached by each algorithm across the whole training process for the sequence generation task. QM outperforms other baselines in terms of sample efficiency.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Proposition 1: informal
  • Proposition 2: quantile additivity
  • Remark 3
  • proof
  • Proposition
  • proof