Can Large Language Models Master Complex Card Games?
Wei Wang, Fuqing Bie, Junzhe Chen, Dan Zhang, Shiyu Huang, Evgeny Kharlamov, Jie Tang
TL;DR
The paper investigates whether large-language models can achieve or approach expert performance in complex card games by systematically fine-tuning on high-quality gameplay data across eight diverse games. It demonstrates that, with supervised fine-tuning and high-quality trajectories generated from strong game AIs, LLMs can closely approach or rival game-specific AIs on several games and can master multiple games simultaneously, albeit with trade-offs in preserving broad general abilities. The study also shows that mixing data from multiple games yields cross-game transfer when rule sets are similar, while introducing conflicts for dissimilar games, and that maintaining general capabilities requires additional general-instruction data during fine-tuning. Overall, the work highlights the potential of LLMs as versatile, general-purpose learners for game environments and provides a data-centric pathway to scaling performance across multiple complex tasks. Code and data pipelines are released to facilitate replication and further research.
Abstract
Complex games have long been an important benchmark for testing the progress of artificial intelligence algorithms. AlphaGo, AlphaZero, and MuZero have defeated top human players in Go and Chess, garnering widespread societal attention towards artificial intelligence. Concurrently, large language models (LLMs) have exhibited remarkable capabilities across various tasks, raising the question of whether LLMs can achieve similar success in complex games. In this paper, we explore the potential of LLMs in mastering complex card games. We systematically assess the learning capabilities of LLMs across eight diverse card games, evaluating the impact of fine-tuning on high-quality gameplay data, and examining the models' ability to retain general capabilities while mastering these games. Our findings indicate that: (1) LLMs can approach the performance of strong game AIs through supervised fine-tuning on high-quality data, (2) LLMs can achieve a certain level of proficiency in multiple complex card games simultaneously, with performance augmentation for games with similar rules and conflicts for dissimilar ones, and (3) LLMs experience a decline in general capabilities when mastering complex games, but this decline can be mitigated by integrating a certain amount of general instruction data. The evaluation results demonstrate strong learning ability and versatility of LLMs. The code is available at https://github.com/THUDM/LLM4CardGame
