Table of Contents
Fetching ...

Bounded Autonomy: Controlling LLM Characters in Live Multiplayer Games

Yunjia Guo, Jinghan Zhu, Siyu Wang, Haixin Qiao

Abstract

Large language models (LLMs) are bringing richer dialogue and social behavior into games, but they also expose a control problem that existing game interfaces do not directly address: how should LLM characters participate in live multiplayer interaction while remaining executable in the shared game world, socially coherent with other active characters, and steerable by players when needed? We frame this problem as bounded autonomy, a control architecture for live multiplayer games that organizes LLM character control around three interfaces: agent-agent interaction, agent-world action execution, and player-agent steering. We instantiate bounded autonomy with probabilistic reply-chain decay, an embedding-based action grounding pipeline with fallback, and whisper, a lightweight soft-steering technique that lets players influence a character's next move without fully overriding autonomy. We deploy this architecture in a live multiplayer social game and study its behavior through analyses of interaction stability, grounding quality, whisper intervention success, and formative interviews. Our results show how bounded autonomy makes LLM character interaction workable in practice, frames controllability as a distinct runtime control problem for LLM characters in live multiplayer games, and provides a concrete exemplar for future games built around this interaction paradigm.

Bounded Autonomy: Controlling LLM Characters in Live Multiplayer Games

Abstract

Large language models (LLMs) are bringing richer dialogue and social behavior into games, but they also expose a control problem that existing game interfaces do not directly address: how should LLM characters participate in live multiplayer interaction while remaining executable in the shared game world, socially coherent with other active characters, and steerable by players when needed? We frame this problem as bounded autonomy, a control architecture for live multiplayer games that organizes LLM character control around three interfaces: agent-agent interaction, agent-world action execution, and player-agent steering. We instantiate bounded autonomy with probabilistic reply-chain decay, an embedding-based action grounding pipeline with fallback, and whisper, a lightweight soft-steering technique that lets players influence a character's next move without fully overriding autonomy. We deploy this architecture in a live multiplayer social game and study its behavior through analyses of interaction stability, grounding quality, whisper intervention success, and formative interviews. Our results show how bounded autonomy makes LLM character interaction workable in practice, frames controllability as a distinct runtime control problem for LLM characters in live multiplayer games, and provides a concrete exemplar for future games built around this interaction paradigm.

Paper Structure

This paper contains 14 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: System architecture for bounded autonomy. The game client captures player input and renders broadcast character behavior, the game server maintains world state and routes events, and the AI service performs priority arbitration, LLM inference, and action grounding. Three named control interfaces span the runtime pipeline: Whisper for player-to-agent soft steering, Converge for agent-to-agent reply arbitration and reply-chain decay, and Ground for agent-to-world executable action grounding with safe fallback.
  • Figure 2: Mechanism of reply-chain decay in Converge. A source-0 injected event can trigger a reply chain under Priority B. Each propagated reply increments source depth, and at each hop continuation is re-sampled according to Eq. \ref{['eq:decay']}, making deeper reply chains progressively less likely to continue. The numeric values shown illustrate the deployed setting $\alpha = 0.2$, for which $P_{\text{reply}}(1)=1.0$, $P_{\text{reply}}(2)=0.8$, $P_{\text{reply}}(3)=0.6$, and the continuation probability reaches zero by $s \geq 6$.
  • Figure 3: Ground pipeline for translating open-ended model output into executable game behavior. The system routes the input to the appropriate candidate pool or pool pair, prunes emotionally contradictory bundles, retrieves the nearest executable bundle or bundle pair by embedding similarity, and executes either the matched result or a safe fallback depending on thresholded confidence.
  • Figure 4: Two execution paths for whisper handling. For to-other whispers, the system uses the whisper to guide LLM bundle-pair selection, then grounds the selected bundle names to executable behavior bundles; for talk actions, the whisper also conditions dialogue generation. For to-self whispers, the system bypasses LLM bundle selection and directly matches the whisper against the to-self bundle pool with threshold-based fallback. Across the system, the three candidate pools contain 378 executable behavior bundles.