Table of Contents
Fetching ...

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

Jijia Liu, Chao Yu, Jiaxuan Gao, Yuqing Xie, Qingmin Liao, Yi Wu, Yu Wang

TL;DR

This work tackles the latency bottleneck of LLM-powered agents in real-time human-AI coordination. It introduces a Hierarchical Language Agent (HLA) that fuses a proficient Slow Mind for intention reasoning and dialogue with a fast, lightweight Fast Mind for macro-action planning and an Executor for high-frequency execution. Through extensive simulations and human studies in the Overcooked game, HLA achieves significantly lower responsiveness, higher cooperative scores, and stronger human preferences compared to baselines. The results highlight the practical potential of hierarchical, mixed-precision reasoning-and-action architectures for real-time, language-mediated human-AI collaboration, with avenues for future improvements using larger LLMs and RL-based low-level control.

Abstract

AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

TL;DR

This work tackles the latency bottleneck of LLM-powered agents in real-time human-AI coordination. It introduces a Hierarchical Language Agent (HLA) that fuses a proficient Slow Mind for intention reasoning and dialogue with a fast, lightweight Fast Mind for macro-action planning and an Executor for high-frequency execution. Through extensive simulations and human studies in the Overcooked game, HLA achieves significantly lower responsiveness, higher cooperative scores, and stronger human preferences compared to baselines. The results highlight the practical potential of hierarchical, mixed-precision reasoning-and-action architectures for real-time, language-mediated human-AI collaboration, with avenues for future improvements using larger LLMs and RL-based low-level control.

Abstract

AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.
Paper Structure (51 sections, 1 equation, 26 figures, 22 tables)

This paper contains 51 sections, 1 equation, 26 figures, 22 tables.

Figures (26)

  • Figure 1: A concrete example of cooperation and communication between a human player and an AI player in Overcooked.
  • Figure 1: Workflow of Slow-Mind-Only Agent. The Fast Mind is discarded, and Slow Mind generates macro actions directly.
  • Figure 2: The cooking process and the maps in the Overcooked testbed.
  • Figure 2: Workflow of Fast-Mind-Only Agent. The Slow Mind is discarded, and Fast Mind generates chat message directly.
  • Figure 3: Framework of Hierarchical Language Agent, including a Slow Mind for intention reasoning and language interaction, a Fast Mind for macro actions generation, and an Executor to execute atomic actions.
  • ...and 21 more figures