Table of Contents
Fetching ...

Towards General Agentic Intelligence via Environment Scaling

Runnan Fang, Shihao Cai, Baixuan Li, Jialong Wu, Guangyu Li, Wenbiao Yin, Xinyu Wang, Xiaobin Wang, Liangcai Su, Zhen Zhang, Shibin Wu, Zhengwei Tao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

TL;DR

This work tackles the scarcity of agentic trajectories by proposing principled environment scaling to create fully simulated, diverse tool-use environments and a two-stage agent experience learning workflow. The authors automate environment construction via a tool graph, domain partitions, and domain-specific databases, while grounding tasks in verifiable read–write operations. A two-phase fine-tuning regime first builds general tool-usage skills and then specializes to vertical domains, achieving state-of-the-art results among open-source models under 1T parameters on τ-bench, τ²‑Bench, and ACEBench, with notable robustness and generalization. The study also highlights ongoing challenges in long-horizon tool calling and discusses future directions including RL on simulated environments and scaling to larger models.

Abstract

Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these capabilities through interaction in varied environments. The breadth of function-calling competence is closely tied to the diversity of environments in which agents are trained. In this work, we scale up environments as a step towards advancing general agentic intelligence. This gives rise to two central challenges: (i) how to scale environments in a principled manner, and (ii) how to effectively train agentic capabilities from experiences derived through interactions with these environments. To address these, we design a scalable framework that automatically constructs heterogeneous environments that are fully simulated, systematically broadening the space of function-calling scenarios. We further adapt a two-phase agent fine-tuning strategy: first endowing agents with fundamental agentic capabilities, then specializing them for domain-specific contexts. Extensive experiments on agentic benchmarks, tau-bench, tau2-Bench, and ACEBench, demonstrate that our trained model, AgentScaler, significantly enhances the function-calling capability of models.

Towards General Agentic Intelligence via Environment Scaling

TL;DR

This work tackles the scarcity of agentic trajectories by proposing principled environment scaling to create fully simulated, diverse tool-use environments and a two-stage agent experience learning workflow. The authors automate environment construction via a tool graph, domain partitions, and domain-specific databases, while grounding tasks in verifiable read–write operations. A two-phase fine-tuning regime first builds general tool-usage skills and then specializes to vertical domains, achieving state-of-the-art results among open-source models under 1T parameters on τ-bench, τ²‑Bench, and ACEBench, with notable robustness and generalization. The study also highlights ongoing challenges in long-horizon tool calling and discusses future directions including RL on simulated environments and scaling to larger models.

Abstract

Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these capabilities through interaction in varied environments. The breadth of function-calling competence is closely tied to the diversity of environments in which agents are trained. In this work, we scale up environments as a step towards advancing general agentic intelligence. This gives rise to two central challenges: (i) how to scale environments in a principled manner, and (ii) how to effectively train agentic capabilities from experiences derived through interactions with these environments. To address these, we design a scalable framework that automatically constructs heterogeneous environments that are fully simulated, systematically broadening the space of function-calling scenarios. We further adapt a two-phase agent fine-tuning strategy: first endowing agents with fundamental agentic capabilities, then specializing them for domain-specific contexts. Extensive experiments on agentic benchmarks, tau-bench, tau2-Bench, and ACEBench, demonstrate that our trained model, AgentScaler, significantly enhances the function-calling capability of models.

Paper Structure

This paper contains 24 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The overview of the environment automatic build, and agentic task construction.
  • Figure 2: The agent interacts with the simulated user and changes the environment state through the generated functions.
  • Figure 3: Performance comparison on the Normal, Agent, and Overall subsets of ACEBench-en for two-stage training models.
  • Figure 4: Pass$\textasciicircum$ k metric results across all domains in the $\tau^2$-Bench.
  • Figure 5: Accuracy by tool call count on $\tau$-bench.