Table of Contents
Fetching ...

Agent models: Internalizing Chain-of-Action Generation into Reasoning models

Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Xinyan Wen, Jitao Sang

TL;DR

This work defines Large Agent Models (LAMs) as reasoning models that internalize Chain-of-Action generation to autonomously control tool usage, introducing AutoCoA to end-to-end train CoT and CoA with an internal world model. AutoCoA combines supervised fine-tuning (with step-level and trajectory-level action learning) and reinforcement learning (with GRPO and simulated-to-real environment transitions) to reduce external interactions while improving long-horizon reasoning. Empirical results on open-domain QA show AutoCoA variants surpass ReAct baselines, with notable gains in multi-hop tasks and efficient use of simulated rollout to limit real-world tool calls. The paper also outlines a staged agent-roadmap toward broader agent capabilities, including RPA, domain analysis, IoT integration, and eventually an Agent OS enabling multi-agent collaboration. This positions AutoCoA as a practical step toward robust, autonomous, tool-augmented reasoning systems in real-world tasks.

Abstract

Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position \emph{Large Agent Models (LAMs)} that internalize the generation of \emph{Chain-of-Action (CoA)}, enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ADaM-BJTU/AutoCoA

Agent models: Internalizing Chain-of-Action Generation into Reasoning models

TL;DR

This work defines Large Agent Models (LAMs) as reasoning models that internalize Chain-of-Action generation to autonomously control tool usage, introducing AutoCoA to end-to-end train CoT and CoA with an internal world model. AutoCoA combines supervised fine-tuning (with step-level and trajectory-level action learning) and reinforcement learning (with GRPO and simulated-to-real environment transitions) to reduce external interactions while improving long-horizon reasoning. Empirical results on open-domain QA show AutoCoA variants surpass ReAct baselines, with notable gains in multi-hop tasks and efficient use of simulated rollout to limit real-world tool calls. The paper also outlines a staged agent-roadmap toward broader agent capabilities, including RPA, domain analysis, IoT integration, and eventually an Agent OS enabling multi-agent collaboration. This positions AutoCoA as a practical step toward robust, autonomous, tool-augmented reasoning systems in real-world tasks.

Abstract

Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position \emph{Large Agent Models (LAMs)} that internalize the generation of \emph{Chain-of-Action (CoA)}, enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ADaM-BJTU/AutoCoA

Paper Structure

This paper contains 29 sections, 7 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: The path from Large Language Model (LLM) to Large Reasoning Model (LRM) and the forthcoming Large Agent Model (LAM).
  • Figure 2: Interaction paradigm: Chatbot vs. Reasoner vs. Agent.
  • Figure 3: Analysis of hong-horizon execution capabilities.
  • Figure 4: Agent roadmap.

Theorems & Definitions (1)

  • Definition 1