Table of Contents
Fetching ...

Training-free Composition of Pre-trained GFlowNets for Multi-Objective Generation

Seokwon Yoon, Youngbin Choi, Seunghyuk Cho, Seungbeom Lee, MoonJeong Park, Dongwoo Kim

TL;DR

This work proposes a training-free mixing policy that composes pre-trained GFlowNets at inference time, enabling rapid adaptation without finetuning or retraining, and proves that the method exactly recovers the target distribution for linear scalarization and quantifies the approximation quality for nonlinear operators through a distortion factor.

Abstract

Generative Flow Networks (GFlowNets) learn to sample diverse candidates in proportion to a reward function, making them well-suited for scientific discovery, where exploring multiple promising solutions is crucial. Further extending GFlowNets to multi-objective settings has attracted growing interest since real-world applications often involve multiple, conflicting objectives. However, existing approaches require additional training for each set of objectives, limiting their applicability and incurring substantial computational overhead. We propose a training-free mixing policy that composes pre-trained GFlowNets at inference time, enabling rapid adaptation without finetuning or retraining. Importantly, our framework is flexible, capable of handling diverse reward combinations ranging from linear scalarization to complex non-linear logical operators, which are often handled separately in previous literature. We prove that our method exactly recovers the target distribution for linear scalarization and quantify the approximation quality for nonlinear operators through a distortion factor. Experiments on a synthetic 2D grid and real-world molecule-generation tasks demonstrate that our approach achieves performance comparable to baselines that require additional training.

Training-free Composition of Pre-trained GFlowNets for Multi-Objective Generation

TL;DR

This work proposes a training-free mixing policy that composes pre-trained GFlowNets at inference time, enabling rapid adaptation without finetuning or retraining, and proves that the method exactly recovers the target distribution for linear scalarization and quantifies the approximation quality for nonlinear operators through a distortion factor.

Abstract

Generative Flow Networks (GFlowNets) learn to sample diverse candidates in proportion to a reward function, making them well-suited for scientific discovery, where exploring multiple promising solutions is crucial. Further extending GFlowNets to multi-objective settings has attracted growing interest since real-world applications often involve multiple, conflicting objectives. However, existing approaches require additional training for each set of objectives, limiting their applicability and incurring substantial computational overhead. We propose a training-free mixing policy that composes pre-trained GFlowNets at inference time, enabling rapid adaptation without finetuning or retraining. Importantly, our framework is flexible, capable of handling diverse reward combinations ranging from linear scalarization to complex non-linear logical operators, which are often handled separately in previous literature. We prove that our method exactly recovers the target distribution for linear scalarization and quantify the approximation quality for nonlinear operators through a distortion factor. Experiments on a synthetic 2D grid and real-world molecule-generation tasks demonstrate that our approach achieves performance comparable to baselines that require additional training.
Paper Structure (63 sections, 5 theorems, 31 equations, 12 figures, 12 tables)

This paper contains 63 sections, 5 theorems, 31 equations, 12 figures, 12 tables.

Key Result

Proposition 4.1

Given $k$ GFlowNets with terminating distributions $p_i(x) \propto R_i(x)$, the mixing policy eq:mixing_policy under linear scalarization exactly realizes the target distribution: where $p_M$ is the distribution induced by our mixing policy.

Figures (12)

  • Figure 1: Qualitative result of scalarization on a 2D grid domain. We visualize the density on each grid of the true distribution (left), MOGFN (middle), and ours (right).
  • Figure 2: Qualitative result of logical operations on a 2D grid domain. We visualize the density on each grid of the true distribution (left), classifier guidance (middle), and ours (right).
  • Figure 3: Distortion factor $\delta(x) = u_M(x)/N_M(x)$ vs. unnormalized target density $\mathcal{G}(p_1(x), p_2(x))$ on the 2D grid domain. Red points indicate outliers, defined as states where $\delta(x)$ falls outside $[Q_1 - 1.5 \cdot \mathrm{IQR}, Q_3 + 1.5 \cdot \mathrm{IQR}]$ (gray dotted lines), where $Q_1$, $Q_3$ are the 25th/75th percentiles and $\mathrm{IQR} = Q_3 - Q_1$. The green dashed line marks $1/Z_M$ with $Z_M = \sum_x \mathcal{G}(p_1(x), p_2(x))$. Additional results are in \ref{['fig:distortion_scatter']}.
  • Figure 4: Density visualization of $p_{\mathrm{SEH}}$, $p_{\mathrm{QED}}$, and the composed distribution $p_M$ induced by our mixing policy via scalarization on fragment-based molecule generation. Each distribution is estimated from 5,000 samples. We vary the weights as (a) $p_M \propto (0.3 \cdot R_{\mathrm{SEH}} + 0.7 \cdot R_{\mathrm{QED}})^{32}$ and (b) $p_M \propto (0.7 \cdot R_{\mathrm{SEH}} + 0.3 \cdot R_{\mathrm{QED}})^{32}$.
  • Figure 5: Distributions $p_M$ induced by classifier-guided and our mixing policy for fragment-based molecule generation. Each distribution is estimated from 5,000 samples. Base distributions $p_{\mathrm{SEH}}$ and $p_{\mathrm{SA}}$ are shown alongside the composed distribution. (a) Classifier-guided mixing policy on $p_{\mathrm{SEH}} \otimes p_{\mathrm{SA}}$. (b) Ours on $p_{\mathrm{SEH}} \otimes p_{\mathrm{SA}}$. (c) Classifier-guided mixing policy on $p_{\mathrm{SA}} \ \ p_{\mathrm{SEH}}$. (d) Ours on $p_{\mathrm{SA}} \ \ p_{\mathrm{SEH}}$.
  • ...and 7 more figures

Theorems & Definitions (9)

  • Proposition 4.1: Exact realization for linear scalarization
  • Lemma 1.1: Normalization
  • proof
  • Lemma 1.2: Mixed reaching probabilities
  • proof
  • Lemma 1.3: State distribution factorization
  • proof
  • Theorem 1.4: Correctness of the composed forward policy
  • proof