Table of Contents
Fetching ...

Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making

David Shoresh, Yonatan Loewenstein

TL;DR

The paper addresses how to uncover and exploit relative advantages in sequential decision making when team members include a human-like agent (Maia) and a strong neural network (Leela) operating in chess. It treats the problem as a Mixture of Experts (MoE) where a separate manager selects between base policies, and evaluates three manager types (RL-trained transformer, domain expert Stockfish, and an oracle) through an opening-position rollout regime in a Markov decision process. The results show substantial potential synergy, with the RL manager often outperforming a domain expert and the oracle revealing large remaining headroom, though synergy declines as team asymmetry increases. These findings imply that human-centric AI design can leverage learned relative-advantage signals even when partners are highly capable but not easily interpretable, and they provide a concrete framework for quantifying and exploiting such synergy in complex sequential tasks.

Abstract

The field of collective intelligence studies how teams can achieve better results than any of the team members alone. The special case of human-machine teams carries unique challenges in this regard. For example, human teams often achieve synergy by communicating to discover their relative advantages, which is not an option if the team partner is an unexplainable deep neural network. Between 2005-2008 a set of "freestyle" chess tournaments were held, in which human-machine teams known as "centaurs", outperformed the best humans and best machines alone. Centaur players reported that they identified relative advantages between themselves and their chess program, even though the program was superhuman. Inspired by this and leveraging recent open-source models, we study human-machine like teams in chess. A human behavioral clone ("Maia") and a pure self-play RL-trained chess engine ("Leela") were composed into a team using a Mixture of Experts (MoE) architecture. By directing our research question at the selection mechanism of the MoE, we could isolate the issue of extracting relative advantages without knowledge sharing. We show that in principle, there is high potential for synergy between human and machine in a complex sequential decision environment such as chess. Furthermore, we show that an expert can identify only a small part of these relative advantages, and that the contribution of its subject matter expertise in doing so saturates quickly. This is probably due to the "curse of knowledge" phenomenon. We also train a network to recognize relative advantages using reinforcement learning, without chess expertise, and it outdoes the expert. Our experiments are repeated in asymmetric teams, in which identifying relative advantages is more challenging. Our findings contribute to the study of collective intelligence and human-centric AI.

Modeling the Centaur: Human-Machine Synergy in Sequential Decision Making

TL;DR

The paper addresses how to uncover and exploit relative advantages in sequential decision making when team members include a human-like agent (Maia) and a strong neural network (Leela) operating in chess. It treats the problem as a Mixture of Experts (MoE) where a separate manager selects between base policies, and evaluates three manager types (RL-trained transformer, domain expert Stockfish, and an oracle) through an opening-position rollout regime in a Markov decision process. The results show substantial potential synergy, with the RL manager often outperforming a domain expert and the oracle revealing large remaining headroom, though synergy declines as team asymmetry increases. These findings imply that human-centric AI design can leverage learned relative-advantage signals even when partners are highly capable but not easily interpretable, and they provide a concrete framework for quantifying and exploiting such synergy in complex sequential tasks.

Abstract

The field of collective intelligence studies how teams can achieve better results than any of the team members alone. The special case of human-machine teams carries unique challenges in this regard. For example, human teams often achieve synergy by communicating to discover their relative advantages, which is not an option if the team partner is an unexplainable deep neural network. Between 2005-2008 a set of "freestyle" chess tournaments were held, in which human-machine teams known as "centaurs", outperformed the best humans and best machines alone. Centaur players reported that they identified relative advantages between themselves and their chess program, even though the program was superhuman. Inspired by this and leveraging recent open-source models, we study human-machine like teams in chess. A human behavioral clone ("Maia") and a pure self-play RL-trained chess engine ("Leela") were composed into a team using a Mixture of Experts (MoE) architecture. By directing our research question at the selection mechanism of the MoE, we could isolate the issue of extracting relative advantages without knowledge sharing. We show that in principle, there is high potential for synergy between human and machine in a complex sequential decision environment such as chess. Furthermore, we show that an expert can identify only a small part of these relative advantages, and that the contribution of its subject matter expertise in doing so saturates quickly. This is probably due to the "curse of knowledge" phenomenon. We also train a network to recognize relative advantages using reinforcement learning, without chess expertise, and it outdoes the expert. Our experiments are repeated in asymmetric teams, in which identifying relative advantages is more challenging. Our findings contribute to the study of collective intelligence and human-centric AI.

Paper Structure

This paper contains 21 sections, 6 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Process for obtaining estimated Q values for RL
  • Figure 2: Transformer model architecture
  • Figure 3: WDL of approximately symmetric team using different managers. "Random" refers to a random mixture policy with p=0.5. For the subject matter expert, we display results for a Stockfish engine manager set to depths 1, 3, 5 and 15. Synergy is above the line of WDL for Maia alone. Error bars represent SEM.
  • Figure 4: WDL for asymmetric teams. Synergy is above the WDL of Leela alone (dashed line). Error bars represent SEM.
  • Figure 5: Distribution of choices between Maia and Leela. The oracle chooses the inferior player (Leela in the symmetric team and Maia in the others) more sparingly than the other managers.
  • ...and 3 more figures