Embarrassingly Parallel GFlowNets

Tiago da Silva; Luiz Max Carvalho; Amauri Souza; Samuel Kaski; Diego Mesquita

Embarrassingly Parallel GFlowNets

Tiago da Silva, Luiz Max Carvalho, Amauri Souza, Samuel Kaski, Diego Mesquita

TL;DR

The paper tackles sampling from a product of local discrete posteriors in distributed data settings by introducing Embarrassingly Parallel GFlowNets (EP-GFlowNets). The method trains $N$ local GFlowNets in parallel and uses Aggregating Balance (AB) to aggregate them into a global model that samples from $R(x)=\prod_{n=1}^N R_n(x)$, with Contrastive Balance (CB) providing a minimal-parameter, variationally linked training objective. The authors establish theoretical guarantees for AB, connect CB to KL divergence and variational objectives, and bound the impact of imperfect local models via a Jeffrey divergence. Empirically, EP-GFlowNets achieve accuracy comparable to centralized GFlowNets across grid world, multisets, sequence design, Bayesian phylogenetics, and federated Bayesian network structure learning while reducing communication and runtime relative to baselines like PCVI.

Abstract

GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standard GFlowNets leads to intensive client-server communication. To alleviate both these issues, we propose embarrassingly parallel GFlowNet (EP-GFlowNet). EP-GFlowNet is a provably correct divide-and-conquer method to sample from product distributions of the form $R(\cdot) \propto R_1(\cdot) ... R_N(\cdot)$ -- e.g., in parallel or federated Bayes, where each $R_n$ is a local posterior defined on a data partition. First, in parallel, we train a local GFlowNet targeting each $R_n$ and send the resulting models to the server. Then, the server learns a global GFlowNet by enforcing our newly proposed \emph{aggregating balance} condition, requiring a single communication step. Importantly, EP-GFlowNets can also be applied to multi-objective optimization and model reuse. Our experiments illustrate the EP-GFlowNets's effectiveness on many tasks, including parallel Bayesian phylogenetics, multi-objective multiset, sequence generation, and federated Bayesian structure learning.

Embarrassingly Parallel GFlowNets

TL;DR

The paper tackles sampling from a product of local discrete posteriors in distributed data settings by introducing Embarrassingly Parallel GFlowNets (EP-GFlowNets). The method trains

local GFlowNets in parallel and uses Aggregating Balance (AB) to aggregate them into a global model that samples from

, with Contrastive Balance (CB) providing a minimal-parameter, variationally linked training objective. The authors establish theoretical guarantees for AB, connect CB to KL divergence and variational objectives, and bound the impact of imperfect local models via a Jeffrey divergence. Empirically, EP-GFlowNets achieve accuracy comparable to centralized GFlowNets across grid world, multisets, sequence design, Bayesian phylogenetics, and federated Bayesian network structure learning while reducing communication and runtime relative to baselines like PCVI.

Abstract

-- e.g., in parallel or federated Bayes, where each

is a local posterior defined on a data partition. First, in parallel, we train a local GFlowNet targeting each

and send the resulting models to the server. Then, the server learns a global GFlowNet by enforcing our newly proposed \emph{aggregating balance} condition, requiring a single communication step. Importantly, EP-GFlowNets can also be applied to multi-objective optimization and model reuse. Our experiments illustrate the EP-GFlowNets's effectiveness on many tasks, including parallel Bayesian phylogenetics, multi-objective multiset, sequence generation, and federated Bayesian structure learning.

Paper Structure (44 sections, 9 theorems, 58 equations, 15 figures, 2 tables, 1 algorithm)

This paper contains 44 sections, 9 theorems, 58 equations, 15 figures, 2 tables, 1 algorithm.

Introduction
Preliminaries
Method
Embarrassingly Parallel GFlowNets
Contrastive balance
Experiments
Grid world
Multiset generation
Design of sequences
Bayesian phylogenetic inference
Federated Bayesian network structure learning
Evaluating the CB loss
Conclusions
Proofs
Proof of \ref{['lemma:contrastive']}
...and 29 more sections

Key Result

Theorem 3.1

Let $\left(p_F^{(1)}, p_{B}^{(1)}\right), \dots, \left(p_F^{(N)}, p_{B}^{(N)}\right): V^2 \rightarrow \mathbb{R}^+$ be pairs of forward and backward policies from $N$ GFlowNets sampling respectively proportionally to $R_1, \ldots, R_N : \mathcal{X} \rightarrow \mathbb{R}^+$. Then, another GFlowNet

Figures (15)

Figure 1: EP-GFlowNet samples proportionally to a pool of locally trained GFlowNets. If a client correctly trains their local model (green) and another client trains theirs incorrectly (red), the distribution inferred by EP-GFlowNet (mid-right) differs from the target product distribution (right).
Figure 2: Grid world. Each heatmap represents the target distribution (first row), based on the normalized reward, and the ones learned by the local GFlowNets (second row). Results for EP-GFlowNet are in the rightmost panels. As established by \ref{['thm:federated_condition']}, the good fit of the local models results in an accurate fit to the combined reward.
Figure 3: Multisets: learned $\times$ ground truth distributions. Plots compare target vs. distributions learned by GFlowNets. The five plots to the left show local models were accurately trained. Thus, a well-trained EP-GFlowNet (right) approximates well the combined reward.
Figure 4: Sequences: learned $\times$ ground truth distributions. . Plots compare target to distributions learned. The five leftmost plots show local GFlowNets were well trained. Hence, as implied by \ref{['thm:federated_condition']}, EP-GFlowNet approximates well the combined reward.
Figure 5: Bayesian phylogenetic inference: learned $\times$ ground truth distributions. Following the pattern in Figures \ref{['fig:grid']}-\ref{['fig:sequence']}, the goodness-of-fit from local GFlowNets (Clients 1-5) is directly reflected in the distribution learned by EP-GFlowNet.
...and 10 more figures

Theorems & Definitions (10)

Theorem 3.1: Aggregating balance condition
Corollary 3.2: Aggregating balance loss
Remark 3.3: Imperfect local inference
Theorem 3.4: Influence of local failures
Lemma 3.5: Constrastive balance condition
Corollary 3.6: Contrastive balance loss
Theorem 3.7: VI & CB
Theorem 2.1
Theorem \ref{thm:federated_condition}$'$: Aggregating balance condition
Theorem \ref{thm:robustness}$'$: Influence of local failures

Embarrassingly Parallel GFlowNets

TL;DR

Abstract

Embarrassingly Parallel GFlowNets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (10)