Table of Contents
Fetching ...

STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making

Chuanhao Li, Runhan Yang, Tiankai Li, Milad Bafarassat, Kourosh Sharifi, Dirk Bergemann, Zhuoran Yang

TL;DR

This work tackles the difficulty of deploying LLMs for strategic, multi-agent decision-making by introducing STRIDE, a memory-augmented, tool-assisted LLM framework. STRIDE uses a central reasoning module to produce structured Thought sequences, couples external working memory with domain-specific operational tools, and demonstrates algorithmic behavior (e.g., value iteration, UCB-VI, backward induction) across MDPs, dynamic mechanism design, and bargaining games. Empirical results show STRIDE achieves higher policy accuracy, faster convergence, and better equilibrium outcomes than baselines, including in unknown-model settings where exploration is crucial. The findings suggest a promising direction for robust, interactive LLM agents capable of strategic reasoning in complex, economically relevant environments.

Abstract

Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing, showing remarkable linguistic proficiency and reasoning capabilities. However, their application in strategic multi-agent decision-making environments is hampered by significant limitations including poor mathematical reasoning, difficulty in following instructions, and a tendency to generate incorrect information. These deficiencies hinder their performance in strategic and interactive tasks that demand adherence to nuanced game rules, long-term planning, exploration in unknown environments, and anticipation of opponents' moves. To overcome these obstacles, this paper presents a novel LLM agent framework equipped with memory and specialized tools to enhance their strategic decision-making capabilities. We deploy the tools in a number of economically important environments, in particular bilateral bargaining and multi-agent and dynamic mechanism design. We employ quantitative metrics to assess the framework's performance in various strategic decision-making problems. Our findings establish that our enhanced framework significantly improves the strategic decision-making capability of LLMs. While we highlight the inherent limitations of current LLM models, we demonstrate the improvements through targeted enhancements, suggesting a promising direction for future developments in LLM applications for interactive environments.

STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making

TL;DR

This work tackles the difficulty of deploying LLMs for strategic, multi-agent decision-making by introducing STRIDE, a memory-augmented, tool-assisted LLM framework. STRIDE uses a central reasoning module to produce structured Thought sequences, couples external working memory with domain-specific operational tools, and demonstrates algorithmic behavior (e.g., value iteration, UCB-VI, backward induction) across MDPs, dynamic mechanism design, and bargaining games. Empirical results show STRIDE achieves higher policy accuracy, faster convergence, and better equilibrium outcomes than baselines, including in unknown-model settings where exploration is crucial. The findings suggest a promising direction for robust, interactive LLM agents capable of strategic reasoning in complex, economically relevant environments.

Abstract

Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing, showing remarkable linguistic proficiency and reasoning capabilities. However, their application in strategic multi-agent decision-making environments is hampered by significant limitations including poor mathematical reasoning, difficulty in following instructions, and a tendency to generate incorrect information. These deficiencies hinder their performance in strategic and interactive tasks that demand adherence to nuanced game rules, long-term planning, exploration in unknown environments, and anticipation of opponents' moves. To overcome these obstacles, this paper presents a novel LLM agent framework equipped with memory and specialized tools to enhance their strategic decision-making capabilities. We deploy the tools in a number of economically important environments, in particular bilateral bargaining and multi-agent and dynamic mechanism design. We employ quantitative metrics to assess the framework's performance in various strategic decision-making problems. Our findings establish that our enhanced framework significantly improves the strategic decision-making capability of LLMs. While we highlight the inherent limitations of current LLM models, we demonstrate the improvements through targeted enhancements, suggesting a promising direction for future developments in LLM applications for interactive environments.
Paper Structure (25 sections, 8 equations, 4 figures, 9 tables, 10 algorithms)

This paper contains 25 sections, 8 equations, 4 figures, 9 tables, 10 algorithms.

Figures (4)

  • Figure 1: Illustration of STRIDE framework, which consists of a reasoning module powered by LLMs, a working memory that stores important parameters of the problem instance and intermediate results of the reasoning process, as well as tools that facilitate reasoning (taking care of low-level computation and managing the working memory) and acting (converting reasoning texts into executable actions).
  • Figure 2: In STRIDE framework, the LLM controls the execution of operations and access to working memory via a sequence of Thought units. Each Thought unit is a structured data containing three fields: (i) text, which suggests the next step of strategic reasoning and summarize important problem parameters; (ii) operations: a list of operations to execute, in order to compute or retrieve information necessary for reasoning; (iii) exit: a boolean value indicating whether the reasoning process is completed. With proper demonstration and operational tools, STRIDE can implement various algorithmic behaviors (the value iteration algorithm here is one example) to facilitate strategic decision making.
  • Figure 3: Agent's objective in Highway Environment is to control the ego-vehicle, i.e., the green box, to reach a high speed while avoiding collision with the other vehicles, i.e., the blue boxes.
  • Figure 4: Comparison of cumulative rewards over episode. We observe that both STRIDE and UCB-VI exhibit rapid increases in their cumulative rewards, converging by approximately the $10$-th episode. This indicates that STRIDE can effectively explore the environment, by emulating UCB-VI in its reasoning process. In contrast, the cumulative rewards of other baseline methods display ongoing fluctuations throughout the episodes, showing poor exploration ability in uncertain environments.