Table of Contents
Fetching ...

Diversifying AI: Towards Creative Chess with AlphaZero

Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

TL;DR

This work asks whether diversity among AI agents can enhance performance in challenging chess tasks. It introduces AZdb, a latent-conditioned league of AlphaZero agents trained to maximize behavioral diversity via intrinsic rewards and coordinated through sub-additive planning and PSRO-style matchmaking. Empirical results show AZdb solves more puzzles than a homogeneous AZ, and that sub-additive planning plus specialized openings yield substantial Elo gains (up to about 50 Elo) over AZ in opening play. The findings suggest diversity bonuses emerge in AI teams, improving exploration, generalization, and problem solving on computationally hard tasks like chess and its puzzles, with implications for designing creative, collaborative AI systems.

Abstract

In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called drosophila of AI. We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZ_db. We train AZ_db to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning. Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZ_db solves twice as many challenging puzzles as AZ, including the challenging Penrose positions. When playing chess from different openings, we notice that players in AZ_db specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ. Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems.

Diversifying AI: Towards Creative Chess with AlphaZero

TL;DR

This work asks whether diversity among AI agents can enhance performance in challenging chess tasks. It introduces AZdb, a latent-conditioned league of AlphaZero agents trained to maximize behavioral diversity via intrinsic rewards and coordinated through sub-additive planning and PSRO-style matchmaking. Empirical results show AZdb solves more puzzles than a homogeneous AZ, and that sub-additive planning plus specialized openings yield substantial Elo gains (up to about 50 Elo) over AZ in opening play. The findings suggest diversity bonuses emerge in AI teams, improving exploration, generalization, and problem solving on computationally hard tasks like chess and its puzzles, with implications for designing creative, collaborative AI systems.

Abstract

In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called drosophila of AI. We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZ_db. We train AZ_db to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning. Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZ_db solves twice as many challenging puzzles as AZ, including the challenging Penrose positions. When playing chess from different openings, we notice that players in AZ_db specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ. Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems.
Paper Structure (27 sections, 12 equations, 23 figures, 25 tables)

This paper contains 27 sections, 12 equations, 23 figures, 25 tables.

Figures (23)

  • Figure 1: Left: an illustration of diverse AI systems playing chess. Right: AZdb architecture. For each player $i$, a matchmaker selects an opponent $j$. A latent variable $l^i$ associated with the player conditions the network outputs $p,v,v_d.$ A score for each position is computed using $v,v_d$ and the diversity intrinsic reward $r^i$ and used with the prior $p$ for MCTS.
  • Figure 2: Left: Standard deviation across players of the piece occupancies of AZdb. Black rectangles highlight the starting position of each piece. Right: average Castling Rights of different players. x-axis corresponds to castling to the Queen (O-O-O) or King (O-O) side as White (W) and Black (B), y-axis corresponds to players. Color units (in the color bars) correspond to occupancy in $\%$ multiplied by the average number of moves in a game (65). For orientation, the chess coordinate system is displayed on the left top.
  • Figure 3: Opening diversity with AZdb. We selected $8$ popular openings where humans tend to play different moves and present with green arrows the moves chosen by AZdb policies with LCB action selection and $100k$ simulations. For each move, we present in black numbers (top) the $\%$ in which this move was played by chess GMs (taken from the Lichess Masters database), and in blue numbers (below them) the relative win percentage ($\%$ wins - $\%$ losses).
  • Figure 4: Left: Two Penrose puzzle positions from the Penrose set. Right: Visualizations of prior probability from a subset of legal actions, and its corresponding raw value and MCTS value estimates for AZ and AZ trained on puzzle sets. Correct and Incorrect moves are colored in Blue and Red respectively.
  • Figure 5: The solve rate of AZ and AZdb with sub additive planning and max over latents in different puzzle sets. Left: 1 training seed with the full configuration and 100M simulations. Results are averaged over 3 evaluation seeds. Center: 3 training seeds with the fast configuration and 1M simulations. Results are averaged over 3 evaluation seeds and the 3 training seeds. Right: 3 training seeds with the fast configuration and 1M simulations. Results are averaged over 3 evaluation seeds. Sub additive planning and max-over-latents are taken over the 3 training seeds.
  • ...and 18 more figures