Table of Contents
Fetching ...

Large Legislative Models: Towards Efficient AI Policymaking in Economic Simulations

Henry Gasztowtt, Benjamin Smith, Vincent Zhu, Qinxun Bai, Edwin Zhang

TL;DR

This work proposes a novel method in which pre-trained Large Language Models (LLMs) are utilized, as sample-efficient policymakers in socially complex multi-agent reinforcement learning (MARL) scenarios, and demonstrates significant efficiency gains.

Abstract

The improvement of economic policymaking presents an opportunity for broad societal benefit, a notion that has inspired research towards AI-driven policymaking tools. AI policymaking holds the potential to surpass human performance through the ability to process data quickly at scale. However, existing RL-based methods exhibit sample inefficiency, and are further limited by an inability to flexibly incorporate nuanced information into their decision-making processes. Thus, we propose a novel method in which we instead utilize pre-trained Large Language Models (LLMs), as sample-efficient policymakers in socially complex multi-agent reinforcement learning (MARL) scenarios. We demonstrate significant efficiency gains, outperforming existing methods across three environments. Our code is available at https://github.com/hegasz/large-legislative-models.

Large Legislative Models: Towards Efficient AI Policymaking in Economic Simulations

TL;DR

This work proposes a novel method in which pre-trained Large Language Models (LLMs) are utilized, as sample-efficient policymakers in socially complex multi-agent reinforcement learning (MARL) scenarios, and demonstrates significant efficiency gains.

Abstract

The improvement of economic policymaking presents an opportunity for broad societal benefit, a notion that has inspired research towards AI-driven policymaking tools. AI policymaking holds the potential to surpass human performance through the ability to process data quickly at scale. However, existing RL-based methods exhibit sample inefficiency, and are further limited by an inability to flexibly incorporate nuanced information into their decision-making processes. Thus, we propose a novel method in which we instead utilize pre-trained Large Language Models (LLMs), as sample-efficient policymakers in socially complex multi-agent reinforcement learning (MARL) scenarios. We demonstrate significant efficiency gains, outperforming existing methods across three environments. Our code is available at https://github.com/hegasz/large-legislative-models.

Paper Structure

This paper contains 47 sections, 2 equations, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: Comparison of policymaker (principal) performance in the Commons Harvest Open environment. Our method demonstrates superior sample efficiency over all baselines. Each method is run on 10 seeds.
  • Figure 2: Ablation results for AI Economist on GTB and MetaGrad on Escape Room. Experiments were repeated over 3 and 10 seeds respectively.
  • Figure 3: LLM Principal. At the beginning of each episode, 1.) A prompt is built from three components: Context, an overview of the problem setting; Demonstrations, a documentation of previous principal actions and associated outcomes; and Query, a request for the next action $\phi$. 2.) An LLM takes in this prompt and produces $\phi$. 3.)$\phi$ induces a POMG $\mathcal{M}^\phi$. Agents train until reaching best-response policy $\pi^*$ under $\phi$, and we then evaluate $(\phi, \pi^*)$ within $\mathcal{M}^\phi$, yielding principal observation trajectory $\tau_P\in \Omega_P^*$ and principal payoff $u_0(\phi, \pi^*)$. 4.) After evaluation, we process $\tau_P$, extracting historical data beyond $u_0(\phi, \pi^*)$. 5.) Payoff and historical data are appended to the prompt history.
  • Figure 4: Performance comparison of different policymaker methods across the Harvest, Clean Up, and CER environments. Each plot displays the principal's reward over validation timesteps. Dashed lines represent principal payoff upon convergence to a policy. In addition to two frontier LLMs, we include RL-based methods of AI Economist, MetaGrad, and three bandit algorithms: UCB, Thompson sampling, and $\epsilon$-greedy. Both instantiations of the LLM principal consistently achieve higher sample efficiency than baselines across all environments. For each environment, we include a closer frame of reference for LLM performance in early timesteps (top) as well as the full run below. All methods were run on 10 seeds.
  • Figure 5: Incentives over time in the Clean Up environment for Gemini-1.5 flash (top) and AI Economist (bottom). From left to right, graphs correspond to the incentive for harvesting, cleaning, and other. Note the LLM x-axis is an order of magnitude smaller than the AI Economist x-axis.
  • ...and 11 more figures

Theorems & Definitions (5)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition D.1