Table of Contents
Fetching ...

Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?

Zhengyu Chen, Jinluan Yang, Teng Xiao, Ruochen Zhou, Luan Zhang, Xiangyu Xi, Xiaowei Shi, Wei Wang, Jinggang Wang

TL;DR

The paper addresses whether tool-augmented RL agents trained in a single domain (math) can generalize to diverse domains, leveraging a code interpreter as the tool. It introduces Tool Generalization Reinforcement Learning (TGRL), combining a standardized tool interface, a dual-component reward, and an XML-based prompt template to promote domain-agnostic tool usage. Across seven reasoning benchmarks spanning math and general-domain tasks, TGRL achieves state-of-the-art results and demonstrates robust cross-domain transfer, with performance improving as model size grows from 7B to 32B. The findings suggest that abstract, domain-invariant tool-use policies learned in one domain can generalize to unseen tasks, enabling more versatile and token-efficient LLM reasoning with tools, and they offer actionable design principles for cross-domain tool integration.

Abstract

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in reasoning and tool utilization. However, the generalization of tool-augmented reinforcement learning (RL) across diverse domains remains underexplored. In this work, we investigate the cross-domain generalization of an LLM agent equipped with a code interpreter tool, which is exclusively trained on mathematical problem-solving tasks. Despite the restricted training domain, we evaluate the agent's performance across several distinct reasoning domains. The results reveal that RL-based tool usage learned from mathematical tasks can be effectively transferred to complex tasks in other domains, enabling great task performance and high token efficiency. To facilitate this cross-domain transfer, we propose a Tool Generalization Reinforcement Learning (TGRL) framework designed to promote domain-agnostic learning and skill migration, encompassing: (i) a standardized tool interface that abstracts domain-specific nuances through consistent formatting and explicit termination, fostering transferable invocation patterns; (ii) a dual-component reward system that decomposes rewards to incentivize generalizable behaviors like tool efficiency and reasoning abstraction, ensuring alignment and robustness across domain shifts; and (iii) an XML-based prompt template that separates thinking, tool calls, and responses to encourage modular, domain-invariant planning and coherent multi-turn interactions. Extensive experiments across diverse benchmarks validate our approach, achieving state-of-the-art performance and highlighting the cross-domain potential of Tool RL for LLM reasoning.

Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?

TL;DR

The paper addresses whether tool-augmented RL agents trained in a single domain (math) can generalize to diverse domains, leveraging a code interpreter as the tool. It introduces Tool Generalization Reinforcement Learning (TGRL), combining a standardized tool interface, a dual-component reward, and an XML-based prompt template to promote domain-agnostic tool usage. Across seven reasoning benchmarks spanning math and general-domain tasks, TGRL achieves state-of-the-art results and demonstrates robust cross-domain transfer, with performance improving as model size grows from 7B to 32B. The findings suggest that abstract, domain-invariant tool-use policies learned in one domain can generalize to unseen tasks, enabling more versatile and token-efficient LLM reasoning with tools, and they offer actionable design principles for cross-domain tool integration.

Abstract

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in reasoning and tool utilization. However, the generalization of tool-augmented reinforcement learning (RL) across diverse domains remains underexplored. In this work, we investigate the cross-domain generalization of an LLM agent equipped with a code interpreter tool, which is exclusively trained on mathematical problem-solving tasks. Despite the restricted training domain, we evaluate the agent's performance across several distinct reasoning domains. The results reveal that RL-based tool usage learned from mathematical tasks can be effectively transferred to complex tasks in other domains, enabling great task performance and high token efficiency. To facilitate this cross-domain transfer, we propose a Tool Generalization Reinforcement Learning (TGRL) framework designed to promote domain-agnostic learning and skill migration, encompassing: (i) a standardized tool interface that abstracts domain-specific nuances through consistent formatting and explicit termination, fostering transferable invocation patterns; (ii) a dual-component reward system that decomposes rewards to incentivize generalizable behaviors like tool efficiency and reasoning abstraction, ensuring alignment and robustness across domain shifts; and (iii) an XML-based prompt template that separates thinking, tool calls, and responses to encourage modular, domain-invariant planning and coherent multi-turn interactions. Extensive experiments across diverse benchmarks validate our approach, achieving state-of-the-art performance and highlighting the cross-domain potential of Tool RL for LLM reasoning.

Paper Structure

This paper contains 32 sections, 13 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Cross-domain comparisons including model performance, interaction turns, and token length for the output on Webinstruct, where we perform tool RL training on Qwen2.5-7B using the code-integrated math dataset.
  • Figure 2: Illustrations of our proposed Tool Generalization Reinforcement Learning (TGRL).
  • Figure : Training Steps vs Format Accuracy
  • Figure : 7B Model Performance
  • Figure : Training Steps vs Format Accuracy
  • ...and 3 more figures