Table of Contents
Fetching ...

Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning

Yuhao Chen, Shuochen Liu, Yuanjie Lyu, Chao Zhang, Jiayao Shi, Tong Xu

TL;DR

Xiangqi-R1 addresses the challenge of spatial strategic reasoning in fully observable board games by training a 7B LLM through a three-stage pipeline on a large Xiangqi dataset augmented with expert annotations and engine evaluations. The approach combines staged supervised fine-tuning (SFT) on board-move and commentary data with a stage-3 expert-guided GRPO reinforcement learning stage using multi-dimensional rewards for moves, analysis, and formatting. Empirical results show Xiangqi-R1 significantly improves move legality and position analysis over larger general-purpose LLMs, demonstrating that domain-specific structure and reinforcement signals can unlock strategic capabilities in compact models. The work provides a Xiangqi-specific evaluation framework and demonstrates that small, specialized models can achieve competitive, even superior, strategic reasoning, with implications for general strategic intelligence in complex spatial domains.

Abstract

Game playing has long served as a fundamental benchmark for evaluating Artificial General Intelligence. While Large Language Models (LLMs) have demonstrated impressive capabilities in general reasoning, their effectiveness in spatial strategic reasoning, which is critical for complex and fully observable board games, remains insufficiently explored. In this work, we adopt Chinese Chess (Xiangqi) as a challenging and rich testbed due to its intricate rules and spatial complexity. To advance LLMs' strategic competence in such environments, we propose a training framework tailored to Xiangqi, built upon a large-scale dataset of five million board-move pairs enhanced with expert annotations and engine evaluations. Building on this foundation, we introduce Xiangqi-R1, a 7B-parameter model trained in multi-stage manner. Our Experimental results indicate that, despite their size and power, general-purpose LLMs struggle to achieve satisfactory performance in these tasks. Compared to general-purpose LLMs, Xiangqi-R1 greatly advances with an 18% rise in move legality and a 22% boost in analysis accuracy. Our results point to a promising path for creating general strategic intelligence in complex areas.

Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning

TL;DR

Xiangqi-R1 addresses the challenge of spatial strategic reasoning in fully observable board games by training a 7B LLM through a three-stage pipeline on a large Xiangqi dataset augmented with expert annotations and engine evaluations. The approach combines staged supervised fine-tuning (SFT) on board-move and commentary data with a stage-3 expert-guided GRPO reinforcement learning stage using multi-dimensional rewards for moves, analysis, and formatting. Empirical results show Xiangqi-R1 significantly improves move legality and position analysis over larger general-purpose LLMs, demonstrating that domain-specific structure and reinforcement signals can unlock strategic capabilities in compact models. The work provides a Xiangqi-specific evaluation framework and demonstrates that small, specialized models can achieve competitive, even superior, strategic reasoning, with implications for general strategic intelligence in complex spatial domains.

Abstract

Game playing has long served as a fundamental benchmark for evaluating Artificial General Intelligence. While Large Language Models (LLMs) have demonstrated impressive capabilities in general reasoning, their effectiveness in spatial strategic reasoning, which is critical for complex and fully observable board games, remains insufficiently explored. In this work, we adopt Chinese Chess (Xiangqi) as a challenging and rich testbed due to its intricate rules and spatial complexity. To advance LLMs' strategic competence in such environments, we propose a training framework tailored to Xiangqi, built upon a large-scale dataset of five million board-move pairs enhanced with expert annotations and engine evaluations. Building on this foundation, we introduce Xiangqi-R1, a 7B-parameter model trained in multi-stage manner. Our Experimental results indicate that, despite their size and power, general-purpose LLMs struggle to achieve satisfactory performance in these tasks. Compared to general-purpose LLMs, Xiangqi-R1 greatly advances with an 18% rise in move legality and a 22% boost in analysis accuracy. Our results point to a promising path for creating general strategic intelligence in complex areas.

Paper Structure

This paper contains 41 sections, 7 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: The Spatial Strategic Reasoning capability involves generating a coherent situation analysis and an appropriate move suggestion, conditioned on a given board state.
  • Figure 2: Overview of our proposed framework. Stage 1: SFT with Board-Move Pair Data. Stage 2: SFT with data containing analysis. Stage 3: A fine-grained reward function is constructed and combined with a GRPO-based RL method to further enhance the model’s spatial strategic reasoning capabilities.
  • Figure 3: Performance of Xiangqi-R1 across piece counts.
  • Figure 4: Xiangqi-R1 vs. Stage2: performance by piece count.
  • Figure 5: Xiangqi-R1 accuracy and usage by piece type.
  • ...and 2 more figures