Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Xinyan Wen, Jitao Sang
TL;DR
This work defines Large Agent Models (LAMs) as reasoning models that internalize Chain-of-Action generation to autonomously control tool usage, introducing AutoCoA to end-to-end train CoT and CoA with an internal world model. AutoCoA combines supervised fine-tuning (with step-level and trajectory-level action learning) and reinforcement learning (with GRPO and simulated-to-real environment transitions) to reduce external interactions while improving long-horizon reasoning. Empirical results on open-domain QA show AutoCoA variants surpass ReAct baselines, with notable gains in multi-hop tasks and efficient use of simulated rollout to limit real-world tool calls. The paper also outlines a staged agent-roadmap toward broader agent capabilities, including RPA, domain analysis, IoT integration, and eventually an Agent OS enabling multi-agent collaboration. This positions AutoCoA as a practical step toward robust, autonomous, tool-augmented reasoning systems in real-world tasks.
Abstract
Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position \emph{Large Agent Models (LAMs)} that internalize the generation of \emph{Chain-of-Action (CoA)}, enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ADaM-BJTU/AutoCoA
