Table of Contents
Fetching ...

Ant Colony Sampling with GFlowNets for Combinatorial Optimization

Minsu Kim, Sanghyeok Choi, Hyeonah Kim, Jiwoo Son, Jinkyoo Park, Yoshua Bengio

TL;DR

The paper tackles combinatorial optimization by addressing limitations of reward-focused RL pretraining. It proposes GFACS, a two-layer approach that first learns a multi-modal prior via GFlowNets and then refines it with posterior search using Ant Colony Optimization, enhanced by energy reshaping and off-policy TB training. The method achieves strong performance across seven CO benchmarks, outperforming vanilla ACO, several RL baselines, and other GFlowNet training approaches, and it improves when paired with active search. This approach offers a modular, scalable framework that balances diversity and optimality, with practical implications for large-scale, constraint-aware CO problems, while leaving theoretical guarantees for future work.

Abstract

We present the Generative Flow Ant Colony Sampler (GFACS), a novel meta-heuristic method that hierarchically combines amortized inference and parallel stochastic search. Our method first leverages Generative Flow Networks (GFlowNets) to amortize a \emph{multi-modal} prior distribution over combinatorial solution space that encompasses both high-reward and diversified solutions. This prior is iteratively updated via parallel stochastic search in the spirit of Ant Colony Optimization (ACO), leading to the posterior distribution that generates near-optimal solutions. Extensive experiments across seven combinatorial optimization problems demonstrate GFACS's promising performances.

Ant Colony Sampling with GFlowNets for Combinatorial Optimization

TL;DR

The paper tackles combinatorial optimization by addressing limitations of reward-focused RL pretraining. It proposes GFACS, a two-layer approach that first learns a multi-modal prior via GFlowNets and then refines it with posterior search using Ant Colony Optimization, enhanced by energy reshaping and off-policy TB training. The method achieves strong performance across seven CO benchmarks, outperforming vanilla ACO, several RL baselines, and other GFlowNet training approaches, and it improves when paired with active search. This approach offers a modular, scalable framework that balances diversity and optimality, with practical implications for large-scale, constraint-aware CO problems, while leaving theoretical guarantees for future work.

Abstract

We present the Generative Flow Ant Colony Sampler (GFACS), a novel meta-heuristic method that hierarchically combines amortized inference and parallel stochastic search. Our method first leverages Generative Flow Networks (GFlowNets) to amortize a \emph{multi-modal} prior distribution over combinatorial solution space that encompasses both high-reward and diversified solutions. This prior is iteratively updated via parallel stochastic search in the spirit of Ant Colony Optimization (ACO), leading to the posterior distribution that generates near-optimal solutions. Extensive experiments across seven combinatorial optimization problems demonstrate GFACS's promising performances.
Paper Structure (47 sections, 2 theorems, 16 equations, 6 figures, 12 tables)

This paper contains 47 sections, 2 theorems, 16 equations, 6 figures, 12 tables.

Key Result

Proposition E.1

Let $\mathcal{G} = (\mathcal{V}, \mathcal{D})$ be a finite, undirected graph where $\mathcal{V}$ is the set of vertices and $\mathcal{D}$ is the set of edges. Assume $\mathcal{G}$ contains a Hamiltonian cycle. Denote $N = |\mathcal{V}|$ as the number of vertices. If $N > 1$, there exist exactly $2N$

Figures (6)

  • Figure 1: Synergy of multi-modal prior distribution of solutions and iterative posterior update with parallel stochastic search in combinatorial optimization.
  • Figure 2: Solution sampling process of GFACS. The GNN, pretraind with a GFlowNet loss, serves as an expert heuristic to pick the next action to construct a solution, such as a tour in the TSP.
  • Figure 3: Experience collection procedure. Energy reshaping compensates for the energy of underrated samples that have potential to become low-energy samples after the local search.
  • Figure 4: Results of ACO algorithms with different priors on various CO tasks. Our GFACS outperforms every ACO baseline. The results are averaged over 3 independent models evaluated on the held-out test sets, and the shade indicates the min-max range of the 3 models.
  • Figure 5: Validation cost on TSP with 200 nodes during training, compared to forward-looking detailed balance (FL-DB).
  • ...and 1 more figures

Theorems & Definitions (4)

  • Proposition E.1: Symmetry Solution in TSP (Hamiltonian Cycle)
  • proof
  • Proposition E.2: Symmetric Solutions in CVRP
  • proof