Table of Contents
Fetching ...

Efficient Learning in Chinese Checkers: Comparing Parameter Sharing in Multi-Agent Reinforcement Learning

Noah Adhikari, Allen Gu

TL;DR

This work introduces a faithful six-player Chinese Checkers MARL environment within PettingZoo and systematically compares three parameter-sharing configurations for multi-agent PPO. It demonstrates that full parameter sharing dramatically accelerates training and improves win rates against random opponents, while reducing game length, highlighting a strong advantage for homogeneous, multi-agent setups. The study analyzes policy strategies via heatmaps and head-to-head matches, and discusses exploration and scaling challenges on larger boards, providing a practical, reusable framework for future homogeneous MARL research. Overall, the results indicate that parameter sharing is a highly effective inductive bias for self-play in symmetric, multi-agent environments, with broad implications for scalable MARL research in similar domains.

Abstract

We show that multi-agent reinforcement learning (MARL) with full parameter sharing outperforms independent and partially shared architectures in the competitive perfect-information homogenous game of Chinese Checkers. To run our experiments, we develop a new MARL environment: variable-size, six-player Chinese Checkers. This custom environment was developed in PettingZoo and supports all traditional rules of the game including chaining jumps. This is, to the best of our knowledge, the first implementation of Chinese Checkers that remains faithful to the true game. Chinese Checkers is difficult to learn due to its large branching factor and potentially infinite horizons. We borrow the concept of branching actions (submoves) from complex action spaces in other RL domains, where a submove may not end a player's turn immediately. This drastically reduces the dimensionality of the action space. Our observation space is inspired by AlphaGo with many binary game boards stacked in a 3D array to encode information. The PettingZoo environment, training and evaluation logic, and analysis scripts can be found on \href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}.

Efficient Learning in Chinese Checkers: Comparing Parameter Sharing in Multi-Agent Reinforcement Learning

TL;DR

This work introduces a faithful six-player Chinese Checkers MARL environment within PettingZoo and systematically compares three parameter-sharing configurations for multi-agent PPO. It demonstrates that full parameter sharing dramatically accelerates training and improves win rates against random opponents, while reducing game length, highlighting a strong advantage for homogeneous, multi-agent setups. The study analyzes policy strategies via heatmaps and head-to-head matches, and discusses exploration and scaling challenges on larger boards, providing a practical, reusable framework for future homogeneous MARL research. Overall, the results indicate that parameter sharing is a highly effective inductive bias for self-play in symmetric, multi-agent environments, with broad implications for scalable MARL research in similar domains.

Abstract

We show that multi-agent reinforcement learning (MARL) with full parameter sharing outperforms independent and partially shared architectures in the competitive perfect-information homogenous game of Chinese Checkers. To run our experiments, we develop a new MARL environment: variable-size, six-player Chinese Checkers. This custom environment was developed in PettingZoo and supports all traditional rules of the game including chaining jumps. This is, to the best of our knowledge, the first implementation of Chinese Checkers that remains faithful to the true game. Chinese Checkers is difficult to learn due to its large branching factor and potentially infinite horizons. We borrow the concept of branching actions (submoves) from complex action spaces in other RL domains, where a submove may not end a player's turn immediately. This drastically reduces the dimensionality of the action space. Our observation space is inspired by AlphaGo with many binary game boards stacked in a 3D array to encode information. The PettingZoo environment, training and evaluation logic, and analysis scripts can be found on \href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}.
Paper Structure (32 sections, 2 equations, 6 figures, 3 algorithms)

This paper contains 32 sections, 2 equations, 6 figures, 3 algorithms.

Figures (6)

  • Figure 1: Game boards of size $N = 2, 3, 4$
  • Figure 2: Evaluation occurred at repeated intervals throughout training for fully-independent, shared-encoder, and fully-shared multi-agent configurations. Left: Win rate of policy against five random policies. Middle: Average game length (turns made by policy). Right: Average rewards.
  • Figure 3: Heatmaps for the policy trained through full parameter sharing. These plots show the frequency of different peg locations on the board after some amount of turns have been made. The trained policy plays as red with pegs starting at the top triangle.
  • Figure 4: Win rates and game length of all three architectures piloting two randomized players per game throughout training.
  • Figure 5: Effect of varying the entropy coefficient $c$ on average game lengths against random opponents.
  • ...and 1 more figures