Table of Contents
Fetching ...

Expected flow networks in stochastic environments and two-player zero-sum games

Marco Jiralerspong, Bilun Sun, Danilo Vucetic, Tianyu Zhang, Yoshua Bengio, Gauthier Gidel, Nikolay Malkin

TL;DR

This work shows that EFlowNets outperform other GFlowNet formulations in stochastic tasks such as protein design and extends the concept of EflowNets to adversarial environments, proposing adversarial flow networks (A FlowNets) for two-player zero-sum games.

Abstract

Generative flow networks (GFlowNets) are sequential sampling models trained to match a given distribution. GFlowNets have been successfully applied to various structured object generation tasks, sampling a diverse set of high-reward objects quickly. We propose expected flow networks (EFlowNets), which extend GFlowNets to stochastic environments. We show that EFlowNets outperform other GFlowNet formulations in stochastic tasks such as protein design. We then extend the concept of EFlowNets to adversarial environments, proposing adversarial flow networks (AFlowNets) for two-player zero-sum games. We show that AFlowNets learn to find above 80% of optimal moves in Connect-4 via self-play and outperform AlphaZero in tournaments.

Expected flow networks in stochastic environments and two-player zero-sum games

TL;DR

This work shows that EFlowNets outperform other GFlowNet formulations in stochastic tasks such as protein design and extends the concept of EflowNets to adversarial environments, proposing adversarial flow networks (A FlowNets) for two-player zero-sum games.

Abstract

Generative flow networks (GFlowNets) are sequential sampling models trained to match a given distribution. GFlowNets have been successfully applied to various structured object generation tasks, sampling a diverse set of high-reward objects quickly. We propose expected flow networks (EFlowNets), which extend GFlowNets to stochastic environments. We show that EFlowNets outperform other GFlowNet formulations in stochastic tasks such as protein design. We then extend the concept of EFlowNets to adversarial environments, proposing adversarial flow networks (AFlowNets) for two-player zero-sum games. We show that AFlowNets learn to find above 80% of optimal moves in Connect-4 via self-play and outperform AlphaZero in tournaments.
Paper Structure (43 sections, 19 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 43 sections, 19 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: We extend GFlowNets to stochastic environments (a) and games (b).
  • Figure 2: Results on the TFBind task (five seeds per setting). EFlowNets tend to find more diverse high-reward states, especially when the reward is peaky and environment stochasticity is high, making the Stoch-GFN constraints unsatisfiable.
  • Figure 3: Elo as a function of training steps and training time. As a convention, random uniform baseline agents represent an Elo of 0. AFlowNets achieve similar Elo to AlphaZero in tic-tac-toe and AFlowNets quickly learn to outperform AlphaZero in Connect-4.
  • Figure 5: Graphs of learning performance over various training runs. (Left) The average MAE of learned node flows (not in log space) compared to ground truth flows computed algorithmically. (Middle) Average MAE for learned edge flows. (Right) Loss rate of the three training regimes against a random uniform opponent.
  • Figure 6: Graphs of learning performance over various training runs for AFN, DQN, SoftDQN and AlphaZero. (Left) The percent of optimal moves (for TicTacToe solved through minimax) over all states. (Right) Loss rate of the algorithms against a random uniform opponent.
  • ...and 1 more figures