Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

Fulin Lin; Shaowen Chen; Ruishan Fang; Hongwei Wang; Tao Lin

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

Fulin Lin, Shaowen Chen, Ruishan Fang, Hongwei Wang, Tao Lin

TL;DR

The paper tackles robustness and efficiency bottlenecks in increasingly autonomous Multi-Agent Systems by introducing SupervisorAgent, a lightweight, non-intrusive meta-agent that supervises real-time interactions without modifying base agents. It formalizes a Supervised Multi-Agent System (SMAS) with a memory-augmented context window and an adaptive, LLM-free filter that triggers a spectrum of interventions at high-risk interaction points (Agent-Agent, Agent-Tool, Agent-Memory). The approach yields substantial token-cost reductions (approximately 29–30% on GAIA) while preserving or improving task success across GAIA and five additional benchmarks, demonstrating model- and MAS-agnostic generalization across multiple foundation models. These results highlight the practical potential of runtime supervision for building robust and efficient large-scale agentic systems, with broad implications for real-world deployment and future research directions in supervisory AI.

Abstract

While Multi-Agent Systems (MAS) excel at complex tasks, their growing autonomy with operational complexity often leads to critical inefficiencies, such as excessive token consumption and failures arising from misinformation. Existing methods primarily focus on post-hoc failure attribution, lacking proactive, real-time interventions to enhance robustness and efficiency. To this end, we introduce SupervisorAgent, a lightweight and modular framework for runtime, adaptive supervision that operates without altering the base agent's architecture. Triggered by an LLM-free adaptive filter, SupervisorAgent intervenes at critical junctures to proactively correct errors, guide inefficient behaviors, and purify observations. On the challenging GAIA benchmark, SupervisorAgent reduces the token consumption of the Smolagent framework by an average of 29.45% without compromising its success rate. Extensive experiments across five additional benchmarks (math reasoning, code generation, and question answering) and various SoTA foundation models validate the broad applicability and robustness of our approach. The code is available at https://github.com/LINs-lab/SupervisorAgent.

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

TL;DR

Abstract

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)

Theorems & Definitions (2)