Table of Contents
Fetching ...

Empowering LLMs in Decision Games through Algorithmic Data Synthesis

Haolin Wang, Xueyan Li, Yazhe Niu, Shuai Hu, Hongsheng Li

TL;DR

This work addresses the challenge of improving deep reasoning in LLMs by leveraging decision-making games as a textual data source. It introduces MasterMind-Dou and MasterMind-Go, trained via carefully designed data-synthesis pipelines on Doudizhu and Go, to enhance multi-step reasoning and decision making. Empirical results show competitive performance in the games and measurable gains on general reasoning benchmarks, with insights into data collection strategies and model generalization. The findings suggest that decision-making game data can complement code/math data to strengthen structured, long-horizon reasoning and pave the way for more capable generalist agents.

Abstract

Large Language Models (LLMs) have exhibited impressive capabilities across numerous domains, yet they often struggle with complex reasoning and decision-making tasks. Decision-making games, which inherently require multifaceted reasoning logic, serve as ideal sandboxes for evaluating and enhancing the reasoning abilities of LLMs. In this work, we first explore whether LLMs can master complex decision-making games through targeted post-training. To this end, we design data synthesis strategies and curate extensive offline datasets from two classic games, Doudizhu and Go. We further develop a suite of techniques to effectively incorporate this data into LLM training, resulting in two novel agents: Mastermind-Dou and Mastermind-Go. Our experimental results demonstrate that these Mastermind LLMs achieve competitive performance in their respective games. Additionally, we explore whether integrating decision-making data can enhance the general reasoning abilities of LLMs. Our findings suggest that such post-training improves certain aspects of reasoning, providing valuable insights for optimizing LLM data collection and synthesis strategies.

Empowering LLMs in Decision Games through Algorithmic Data Synthesis

TL;DR

This work addresses the challenge of improving deep reasoning in LLMs by leveraging decision-making games as a textual data source. It introduces MasterMind-Dou and MasterMind-Go, trained via carefully designed data-synthesis pipelines on Doudizhu and Go, to enhance multi-step reasoning and decision making. Empirical results show competitive performance in the games and measurable gains on general reasoning benchmarks, with insights into data collection strategies and model generalization. The findings suggest that decision-making game data can complement code/math data to strengthen structured, long-horizon reasoning and pave the way for more capable generalist agents.

Abstract

Large Language Models (LLMs) have exhibited impressive capabilities across numerous domains, yet they often struggle with complex reasoning and decision-making tasks. Decision-making games, which inherently require multifaceted reasoning logic, serve as ideal sandboxes for evaluating and enhancing the reasoning abilities of LLMs. In this work, we first explore whether LLMs can master complex decision-making games through targeted post-training. To this end, we design data synthesis strategies and curate extensive offline datasets from two classic games, Doudizhu and Go. We further develop a suite of techniques to effectively incorporate this data into LLM training, resulting in two novel agents: Mastermind-Dou and Mastermind-Go. Our experimental results demonstrate that these Mastermind LLMs achieve competitive performance in their respective games. Additionally, we explore whether integrating decision-making data can enhance the general reasoning abilities of LLMs. Our findings suggest that such post-training improves certain aspects of reasoning, providing valuable insights for optimizing LLM data collection and synthesis strategies.

Paper Structure

This paper contains 25 sections, 6 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: By leveraging synthesized data from diverse decision-making games (such as Doudizhu and Go), current LLMs can be meticulously refined and enhanced, paving the way for their evolution into highly capable and intelligent agents in the future.
  • Figure 2: MasterMind-Dou training pipeline workflow. The LLM first predicts probable legal moves, analyzes corresponding opponent responses, and then determines the optimal decision.
  • Figure 3: The training pipeline overview of Mastermind-Go. It is designed to incorporate a hierarchical approach, featuring four distinct tasks that generate progressive and various Go board analyses. These analyses are then transformed into textual data for fine-tuning the LLM.
  • Figure 4: An example for step-by-step reasoning in Go.
  • Figure 5: Doudizhu game replay for DouZero (landlord) v.s. DouZero (peasants). Here is the key frames of a complete game. The peasants wins finally.
  • ...and 3 more figures