Table of Contents
Fetching ...

AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution

Zhiqiang Xie, Hao Kang, Ying Sheng, Tushar Krishna, Kayvon Fatahalian, Christos Kozyrakis

TL;DR

AI Metropolis is introduced, a simulation engine that improves the efficiency of LLM agent simulations by incorporating out-of-order execution scheduling and dynamically tracking real dependencies between agents, enhancing parallelism and enabling efficient hardware utilization.

Abstract

With more advanced natural language understanding and reasoning capabilities, large language model (LLM)-powered agents are increasingly developed in simulated environments to perform complex tasks, interact with other agents, and exhibit emergent behaviors relevant to social science and gaming. However, current multi-agent simulations frequently suffer from inefficiencies due to the limited parallelism caused by false dependencies, resulting in performance bottlenecks. In this paper, we introduce AI Metropolis, a simulation engine that improves the efficiency of LLM agent simulations by incorporating out-of-order execution scheduling. By dynamically tracking real dependencies between agents, AI Metropolis minimizes false dependencies, enhancing parallelism and enabling efficient hardware utilization. Our evaluations demonstrate that AI Metropolis achieves speedups from 1.3x to 4.15x over standard parallel simulation with global synchronization, approaching optimal performance as the number of agents increases.

AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution

TL;DR

AI Metropolis is introduced, a simulation engine that improves the efficiency of LLM agent simulations by incorporating out-of-order execution scheduling and dynamically tracking real dependencies between agents, enhancing parallelism and enabling efficient hardware utilization.

Abstract

With more advanced natural language understanding and reasoning capabilities, large language model (LLM)-powered agents are increasingly developed in simulated environments to perform complex tasks, interact with other agents, and exhibit emergent behaviors relevant to social science and gaming. However, current multi-agent simulations frequently suffer from inefficiencies due to the limited parallelism caused by false dependencies, resulting in performance bottlenecks. In this paper, we introduce AI Metropolis, a simulation engine that improves the efficiency of LLM agent simulations by incorporating out-of-order execution scheduling. By dynamically tracking real dependencies between agents, AI Metropolis minimizes false dependencies, enhancing parallelism and enabling efficient hardware utilization. Our evaluations demonstrate that AI Metropolis achieves speedups from 1.3x to 4.15x over standard parallel simulation with global synchronization, approaching optimal performance as the number of agents increases.

Paper Structure

This paper contains 37 sections, 2 equations, 7 figures, 1 table, 3 algorithms.

Figures (7)

  • Figure 1: A snippet of the execution trace of a simulation. The x-axis shows the elapsed execution time, with each row representing an agent’s stream of LLM invocations. Colored bars denote different agent functions, and black dashed vertical lines indicate the completion of each step.
  • Figure 2: The dependency between agents' tasks is introduced by temporal causality. The top illustration shows an overly strict enforcement of this dependency, while the bottom illustration depicts a case of actual dependency.
  • Figure 3: An example of a spatiotemporal dependency graph. Each node, such as A@x, represents an agent (A) at a specific time step (x). Single arrows indicate dependencies, while double arrows represent coupled relationships between agents. Purple boxes denote clusters of agents, where green nodes indicate agents that are ready for execution and orange nodes represent blocked agents.
  • Figure 4: (\ref{['fig:l4_exe']}, \ref{['fig:a100_exe']}) End-to-end 25 agents full day simulation completion time with different number of GPUs. (\ref{['fig:call_distribution']}) shows the distribution of LLM calls over the simulated hours, note the low activity period during 1am-4am is because all agents are sleeping.
  • Figure 5: Benchmark of busy (12 a.m. - 1 p.m.) and quiet (6 a.m. - 7 a.m.) hours using Llama-3-8b-instruct on NVIDIA L4 GPUs, with agent counts scaled from 25 to 1000. Single-thread results for 500 and 1000 agents are projected based on workload estimations.
  • ...and 2 more figures