Table of Contents
Fetching ...

A Simple, Solid, and Reproducible Baseline for Bridge Bidding AI

Haruka Kita, Sotetsu Koyamada, Yotaro Yamaguchi, Shin Ishii

TL;DR

This work tackles the bidding phase of contract bridge by proposing a simple, reproducible baseline that combines supervised pretraining on SAYC-derived data with PPO-based reinforcement learning and fictitious self-play. The approach uses a 480-dim binary input and a 4-layer 1024-unit MLP to predict 38 bidding actions and estimate value, with end-of-game rewards tied to a DDS-derived score via $z = \text{score} / 7600$. Ablation studies show that supervised pretraining is essential for surpassing the WBridge5 benchmark, while fictitious self-play helps stabilize training. The method achieves a new performance level against WBridge5 (+1.24 IMPs/board) and is released as open-source to foster reproducibility, evaluation, and further advances in bridge AI.

Abstract

Contract bridge, a cooperative game characterized by imperfect information and multi-agent dynamics, poses significant challenges and serves as a critical benchmark in artificial intelligence (AI) research. Success in this domain requires agents to effectively cooperate with their partners. This study demonstrates that an appropriate combination of existing methods can perform surprisingly well in bridge bidding against WBridge5, a leading benchmark in the bridge bidding system and a multiple-time World Computer-Bridge Championship winner. Our approach is notably simple, yet it outperforms the current state-of-the-art methodologies in this field. Furthermore, we have made our code and models publicly available as open-source software. This initiative provides a strong starting foundation for future bridge AI research, facilitating the development and verification of new strategies and advancements in the field.

A Simple, Solid, and Reproducible Baseline for Bridge Bidding AI

TL;DR

This work tackles the bidding phase of contract bridge by proposing a simple, reproducible baseline that combines supervised pretraining on SAYC-derived data with PPO-based reinforcement learning and fictitious self-play. The approach uses a 480-dim binary input and a 4-layer 1024-unit MLP to predict 38 bidding actions and estimate value, with end-of-game rewards tied to a DDS-derived score via . Ablation studies show that supervised pretraining is essential for surpassing the WBridge5 benchmark, while fictitious self-play helps stabilize training. The method achieves a new performance level against WBridge5 (+1.24 IMPs/board) and is released as open-source to foster reproducibility, evaluation, and further advances in bridge AI.

Abstract

Contract bridge, a cooperative game characterized by imperfect information and multi-agent dynamics, poses significant challenges and serves as a critical benchmark in artificial intelligence (AI) research. Success in this domain requires agents to effectively cooperate with their partners. This study demonstrates that an appropriate combination of existing methods can perform surprisingly well in bridge bidding against WBridge5, a leading benchmark in the bridge bidding system and a multiple-time World Computer-Bridge Championship winner. Our approach is notably simple, yet it outperforms the current state-of-the-art methodologies in this field. Furthermore, we have made our code and models publicly available as open-source software. This initiative provides a strong starting foundation for future bridge AI research, facilitating the development and verification of new strategies and advancements in the field.
Paper Structure (12 sections, 2 figures, 2 tables)

This paper contains 12 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Ablation of each training component.
  • Figure 2: Comparison of usual self-play (w/o FSP) and FSP. Each item represents the IMPs/board scaled by tanh of the model at the X-steps against the model at the Y-steps (X is greater than Y).