Table of Contents
Fetching ...

Mastering Chinese Chess AI (Xiangqi) Without Search

Yu Chen, Juntong Lin, Zhichao Shu

TL;DR

Value Estimation with Cutoff (VECT) improves the original PPO algorithm training process and improves the original PPO algorithm training process and the explanation is given.

Abstract

We have developed a high-performance Chinese Chess AI that operates without reliance on search algorithms. This AI has demonstrated the capability to compete at a level commensurate with the top 0.1\% of human players. By eliminating the search process typically associated with such systems, this AI achieves a Queries Per Second (QPS) rate that exceeds those of systems based on the Monte Carlo Tree Search (MCTS) algorithm by over a thousandfold and surpasses those based on the AlphaBeta pruning algorithm by more than a hundredfold. The AI training system consists of two parts: supervised learning and reinforcement learning. Supervised learning provides an initial human-like Chinese chess AI, while reinforcement learning, based on supervised learning, elevates the strength of the entire AI to a new level. Based on this training system, we carried out enough ablation experiments and discovered that 1. The same parameter amount of Transformer architecture has a higher performance than CNN on Chinese chess; 2. Possible moves of both sides as features can greatly improve the training process; 3. Selective opponent pool, compared to pure self-play training, results in a faster improvement curve and a higher strength limit. 4. Value Estimation with Cutoff(VECT) improves the original PPO algorithm training process and we will give the explanation.

Mastering Chinese Chess AI (Xiangqi) Without Search

TL;DR

Value Estimation with Cutoff (VECT) improves the original PPO algorithm training process and improves the original PPO algorithm training process and the explanation is given.

Abstract

We have developed a high-performance Chinese Chess AI that operates without reliance on search algorithms. This AI has demonstrated the capability to compete at a level commensurate with the top 0.1\% of human players. By eliminating the search process typically associated with such systems, this AI achieves a Queries Per Second (QPS) rate that exceeds those of systems based on the Monte Carlo Tree Search (MCTS) algorithm by over a thousandfold and surpasses those based on the AlphaBeta pruning algorithm by more than a hundredfold. The AI training system consists of two parts: supervised learning and reinforcement learning. Supervised learning provides an initial human-like Chinese chess AI, while reinforcement learning, based on supervised learning, elevates the strength of the entire AI to a new level. Based on this training system, we carried out enough ablation experiments and discovered that 1. The same parameter amount of Transformer architecture has a higher performance than CNN on Chinese chess; 2. Possible moves of both sides as features can greatly improve the training process; 3. Selective opponent pool, compared to pure self-play training, results in a faster improvement curve and a higher strength limit. 4. Value Estimation with Cutoff(VECT) improves the original PPO algorithm training process and we will give the explanation.
Paper Structure (18 sections, 7 equations, 3 figures, 4 tables)

This paper contains 18 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The upper half of the image describes the framework of the training algorithm for supervised learning during the training process, which includes modules such as data cleaning, data sampling, auxiliary tasks, and so on. The lower half of the image describes the algorithmic flow of reinforcement learning during the training process, including restoring the supervised learning model, Dynamics Opponent Pool, and the PPO algorithm, among others.
  • Figure 2: Comparison of accuracy for different game stages between uniform sampling and sample with curve
  • Figure 3: Win(Draw) rate for modResNet18 versus baseline with different features. From top to bottom, feature contains game board state, ally valid moves, enemy valid moves step by step