Table of Contents
Fetching ...

GTM: Simulating the World of Tools for AI Agents

Zhenzhen Ren, Xinpeng Zhang, Zhenxing Qian, Yan Gao, Yu Shi, Shuxin Zheng, Jiyan He

TL;DR

GTM offers a universal tool simulator that decouples agent learning from real tool calls, dramatically accelerating RL training while preserving output quality. Built on the CARG data pipeline, GTM synthesizes over 20,000 tools across 300 domains to teach syntax, semantics, and multi-turn context in tool usage. Experiments show GTM achieves near-parity with real tools in output quality, with orders-of-magnitude speedups and robust generalization to unseen tools and domain adaptation via fine-tuning. This approach provides a scalable, cost-effective foundation for training tool-augmented AI agents across diverse applications.

Abstract

The integration of external tools is pivotal for empowering Large Language Model (LLM) agents with real-world capabilities. However, training these agents through direct, continuous interaction with diverse tools is often prohibitively expensive, slow, and introduces additional development and maintenance overhead. To address this challenge, we introduce the Generalist Tool Model (GTM), a 1.5-billion-parameter model that learns to act as a universal tool simulator. With only prompt-level configuration, GTM accesses tool functionalities along with input arguments and generates outputs that faithfully mimic real tool execution, providing a fast and cost-effective solution that eliminates development overhead. To build GTM, we propose the Context-Aware Response Generation (CARG) pipeline, which synthesizes comprehensive training data covering over 20,000 tools across 300 domains including physics, medicine, robotics, and finance. Through this pipeline, GTM learns to produce not only syntactically correct outputs but also logically coherent and contextually appropriate responses. Experiments demonstrate that GTM produces high-quality outputs with strong consistency and reliability. Besides when used in real reinforcement learning scenarios for agent training, GTM exhibits significantly faster simulation speed compared to real tools while maintaining comparable output quality, along with remarkable generalization and domain adaptability. Our results establish GTM as a foundational component for developing future AI agents, enabling efficient and scalable training of tool-augmented systems.

GTM: Simulating the World of Tools for AI Agents

TL;DR

GTM offers a universal tool simulator that decouples agent learning from real tool calls, dramatically accelerating RL training while preserving output quality. Built on the CARG data pipeline, GTM synthesizes over 20,000 tools across 300 domains to teach syntax, semantics, and multi-turn context in tool usage. Experiments show GTM achieves near-parity with real tools in output quality, with orders-of-magnitude speedups and robust generalization to unseen tools and domain adaptation via fine-tuning. This approach provides a scalable, cost-effective foundation for training tool-augmented AI agents across diverse applications.

Abstract

The integration of external tools is pivotal for empowering Large Language Model (LLM) agents with real-world capabilities. However, training these agents through direct, continuous interaction with diverse tools is often prohibitively expensive, slow, and introduces additional development and maintenance overhead. To address this challenge, we introduce the Generalist Tool Model (GTM), a 1.5-billion-parameter model that learns to act as a universal tool simulator. With only prompt-level configuration, GTM accesses tool functionalities along with input arguments and generates outputs that faithfully mimic real tool execution, providing a fast and cost-effective solution that eliminates development overhead. To build GTM, we propose the Context-Aware Response Generation (CARG) pipeline, which synthesizes comprehensive training data covering over 20,000 tools across 300 domains including physics, medicine, robotics, and finance. Through this pipeline, GTM learns to produce not only syntactically correct outputs but also logically coherent and contextually appropriate responses. Experiments demonstrate that GTM produces high-quality outputs with strong consistency and reliability. Besides when used in real reinforcement learning scenarios for agent training, GTM exhibits significantly faster simulation speed compared to real tools while maintaining comparable output quality, along with remarkable generalization and domain adaptability. Our results establish GTM as a foundational component for developing future AI agents, enabling efficient and scalable training of tool-augmented systems.

Paper Structure

This paper contains 22 sections, 7 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Comparison between the process of use real tool and use GTM simulated tools. a. Real tool environment need various tools. b. With only prompt-level modification, GTM can simulate various tools, thus providing a more generous choice for agent tool learning.
  • Figure 2: Unified Tool Template Structure
  • Figure 3: Top 12 field overview. Inner: field, Outer: subfield
  • Figure 4: Validation scores in search tool scenarios. GTM-Only achieves 0.417 compared to Real-Tool's 0.424.
  • Figure 5: Average time per step in search tool scenarios. GTM-Only takes 5,255 seconds total compared to Real-Tool's 33,092 seconds, achieving 6× speedup.
  • ...and 3 more figures