Table of Contents
Fetching ...

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

Fulin Lin, Shaowen Chen, Ruishan Fang, Hongwei Wang, Tao Lin

TL;DR

The paper tackles robustness and efficiency bottlenecks in increasingly autonomous Multi-Agent Systems by introducing SupervisorAgent, a lightweight, non-intrusive meta-agent that supervises real-time interactions without modifying base agents. It formalizes a Supervised Multi-Agent System (SMAS) with a memory-augmented context window and an adaptive, LLM-free filter that triggers a spectrum of interventions at high-risk interaction points (Agent-Agent, Agent-Tool, Agent-Memory). The approach yields substantial token-cost reductions (approximately 29–30% on GAIA) while preserving or improving task success across GAIA and five additional benchmarks, demonstrating model- and MAS-agnostic generalization across multiple foundation models. These results highlight the practical potential of runtime supervision for building robust and efficient large-scale agentic systems, with broad implications for real-world deployment and future research directions in supervisory AI.

Abstract

While Multi-Agent Systems (MAS) excel at complex tasks, their growing autonomy with operational complexity often leads to critical inefficiencies, such as excessive token consumption and failures arising from misinformation. Existing methods primarily focus on post-hoc failure attribution, lacking proactive, real-time interventions to enhance robustness and efficiency. To this end, we introduce SupervisorAgent, a lightweight and modular framework for runtime, adaptive supervision that operates without altering the base agent's architecture. Triggered by an LLM-free adaptive filter, SupervisorAgent intervenes at critical junctures to proactively correct errors, guide inefficient behaviors, and purify observations. On the challenging GAIA benchmark, SupervisorAgent reduces the token consumption of the Smolagent framework by an average of 29.45% without compromising its success rate. Extensive experiments across five additional benchmarks (math reasoning, code generation, and question answering) and various SoTA foundation models validate the broad applicability and robustness of our approach. The code is available at https://github.com/LINs-lab/SupervisorAgent.

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

TL;DR

The paper tackles robustness and efficiency bottlenecks in increasingly autonomous Multi-Agent Systems by introducing SupervisorAgent, a lightweight, non-intrusive meta-agent that supervises real-time interactions without modifying base agents. It formalizes a Supervised Multi-Agent System (SMAS) with a memory-augmented context window and an adaptive, LLM-free filter that triggers a spectrum of interventions at high-risk interaction points (Agent-Agent, Agent-Tool, Agent-Memory). The approach yields substantial token-cost reductions (approximately 29–30% on GAIA) while preserving or improving task success across GAIA and five additional benchmarks, demonstrating model- and MAS-agnostic generalization across multiple foundation models. These results highlight the practical potential of runtime supervision for building robust and efficient large-scale agentic systems, with broad implications for real-world deployment and future research directions in supervisory AI.

Abstract

While Multi-Agent Systems (MAS) excel at complex tasks, their growing autonomy with operational complexity often leads to critical inefficiencies, such as excessive token consumption and failures arising from misinformation. Existing methods primarily focus on post-hoc failure attribution, lacking proactive, real-time interventions to enhance robustness and efficiency. To this end, we introduce SupervisorAgent, a lightweight and modular framework for runtime, adaptive supervision that operates without altering the base agent's architecture. Triggered by an LLM-free adaptive filter, SupervisorAgent intervenes at critical junctures to proactively correct errors, guide inefficient behaviors, and purify observations. On the challenging GAIA benchmark, SupervisorAgent reduces the token consumption of the Smolagent framework by an average of 29.45% without compromising its success rate. Extensive experiments across five additional benchmarks (math reasoning, code generation, and question answering) and various SoTA foundation models validate the broad applicability and robustness of our approach. The code is available at https://github.com/LINs-lab/SupervisorAgent.

Paper Structure

This paper contains 48 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The SupervisorAgent Framework: Concept and Impact.(a) Illustrative examples of common failure modes in MAS, including error propagation and inefficient loops, and the corresponding intervention by our SupervisorAgent. (b) An overview of a conventional MAS, highlighting the high-risk interaction loci (agent-agent, agent-tool, agent-memory) where such failures occur. (c) The core workflow of our SupervisorAgent, which monitors these interactions to provide real-time intervention. (d) The resulting Supervised MAS (SMAS), which integrates the SupervisorAgent to enhance robustness and efficiency. (e) Performance on GAIA (Level 2), where SMAS (blue) reduces token cost by 35% and variance by 63% versus the baseline (red).
  • Figure 2: The architecture and workflow of SupervisorAgent.(a) The LLM-free adaptive filter for identifying high-risk interactions. (b) The context window, aggregating goals and traces for situational awareness. (c) The spectrum of intervention actions, from simple approval to intensive verification. (d, e) Case study on a GAIA task, comparing the baseline MAS (d) with our SMAS (e), which cuts steps by 43% and token cost by over 70%. (f) The supervise workflow for an interaction, from filtering to a final supervision action.
  • Figure 3: SupervisorAgent enhances performance consistency on the GAIA benchmark.(a) Violin plots of token cost distributions, revealing the more compact and predictable performance of our Supervised MAS (SMAS). (b) A direct comparison quantifying the substantial reduction in token cost variance achieved by our SMAS across all difficulty levels.
  • Figure 4: Ablation study and model generalization of SupervisorAgent.(a) Ablation study on challenging GAIA tasks, dissecting the distinct contributions of each module to the framework's overall efficiency and robustness. (b) Validation of model-agnosticism, showing that SupervisorAgent consistently delivers token savings across diverse foundation models.

Theorems & Definitions (2)

  • Definition 1: Supervised Multi-Agent System (SMAS)
  • Definition 2: Context Window