Table of Contents
Fetching ...

Reasoning, Memorization, and Fine-Tuning Language Models for Non-Cooperative Games

Yunhao Yang, Leonard Berthellemy, Ufuk Topcu

TL;DR

A method that integrates the tree of thoughts and multi-agent framework to enhance the capability of pre-trained language models in solving complex, unfamiliar games is developed, demonstrating its efficiency and scalability.

Abstract

We develop a method that integrates the tree of thoughts and multi-agent framework to enhance the capability of pre-trained language models in solving complex, unfamiliar games. The method decomposes game-solving into four incremental tasks -- game summarization, area selection, action extraction, and action validation -- each assigned to a specific language-model agent. By constructing a tree of thoughts, the method simulates reasoning paths and allows agents to collaboratively distill game representations and tactics, mitigating the limitations of language models in reasoning and long-term memorization. Additionally, an automated fine-tuning process further optimizes the agents' performance by ranking query-response pairs based on game outcomes, e.g., winning or losing. We apply the method to a non-cooperative game and demonstrate a 65 percent winning rate against benchmark algorithms, with an additional 10 percent improvement after fine-tuning. In contrast to existing deep learning algorithms for game solving that require millions of training samples, the proposed method consumes approximately 1000 training samples, highlighting its efficiency and scalability.

Reasoning, Memorization, and Fine-Tuning Language Models for Non-Cooperative Games

TL;DR

A method that integrates the tree of thoughts and multi-agent framework to enhance the capability of pre-trained language models in solving complex, unfamiliar games is developed, demonstrating its efficiency and scalability.

Abstract

We develop a method that integrates the tree of thoughts and multi-agent framework to enhance the capability of pre-trained language models in solving complex, unfamiliar games. The method decomposes game-solving into four incremental tasks -- game summarization, area selection, action extraction, and action validation -- each assigned to a specific language-model agent. By constructing a tree of thoughts, the method simulates reasoning paths and allows agents to collaboratively distill game representations and tactics, mitigating the limitations of language models in reasoning and long-term memorization. Additionally, an automated fine-tuning process further optimizes the agents' performance by ranking query-response pairs based on game outcomes, e.g., winning or losing. We apply the method to a non-cooperative game and demonstrate a 65 percent winning rate against benchmark algorithms, with an additional 10 percent improvement after fine-tuning. In contrast to existing deep learning algorithms for game solving that require millions of training samples, the proposed method consumes approximately 1000 training samples, highlighting its efficiency and scalability.

Paper Structure

This paper contains 24 sections, 3 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An example of the Goedendag game: The example shows two players $p1, p2$ and five states $q1,...,q5$. The game board comprises four hexagons; the blue and red hexagons indicate where $p1$ and $p2$'s pieces are located. The solid arrows are actions the player takes, and the dashed arrows are state transitions. The action with the same color triggers each transition. $q5$ is a terminated state of drawing, neither of them wins.
  • Figure 2: Demonstration of the method. The left figure shows the tree of thought that connects multiple tasks to solve the game. The right figure shows four tasks, where we assign a language model agent to solve each task.
  • Figure 3: An example of a selected area.
  • Figure 4: Example of a 4x4 board (left) and a 10x10 board (right).
  • Figure 5: Cross entropy loss at every epoch during the fine-tuning procedure. The area and action agents converge to a lower loss compared to the end-to-end agent, indicating a potential better performance.
  • ...and 1 more figures