Reducing Cognitive Overhead in Tool Use via Multi-Small-Agent Reinforcement Learning
Dayu Wang, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li
TL;DR
MSARL tackles cognitive overhead in tool-enabled reasoning by decoupling high-level reasoning from tool interpretation through a dedicated Reasoning Agent and specialized Tool Agents. It trains these agents jointly with collaboration-oriented rewards, enabling efficient information flow and scalable interaction patterns. On mathematical problem solving requiring code execution, MSARL achieves higher reasoning stability and final-answer accuracy than single-agent baselines and generalizes to diverse tool-use tasks. The work offers empirical evidence and a modular blueprint for building scalable, specialized-agent AI systems.
Abstract
Recent advances in multi-agent systems highlight the potential of specialized small agents that collaborate via division of labor. Existing tool-integrated reasoning systems, however, often follow a single-agent paradigm in which one large model interleaves long-horizon reasoning with precise tool operations, leading to cognitive-load interference and unstable coordination. We present MSARL, a Multi-Small-Agent Reinforcement Learning framework that explicitly decouples reasoning from tool use. In MSARL, a Reasoning Agent decomposes problems and plans tool invocations, while multiple Tool Agents specialize in specific external tools, each trained via a combination of imitation learning and reinforcement learning with role-specific rewards. On mathematical problem solving with code execution, MSARL significantly improves reasoning stability and final-answer accuracy over single-agent baselines. Moreover, the architecture generalizes to diverse tool-use tasks, demonstrating that cognitive-role decoupling with small agents is a scalable blueprint for multi-agent AI design.
