Table of Contents
Fetching ...

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Dongfu Jiang, Yi Lu, Zhuofeng Li, Zhiheng Lyu, Ping Nie, Haozhe Wang, Alex Su, Hui Chen, Kai Zou, Chao Du, Tianyu Pang, Wenhu Chen

TL;DR

VerlTool tackles key bottlenecks in Agentic Reinforcement Learning with Tool Use by delivering a unified, modular framework that decouples RL training from tool execution, supports diverse multimodal tools, and enables asynchronous rollouts for efficiency. Grounded in VeRL alignment, VerlTool provides a standardized tool server API, a plug-in architecture for rapid tool integration, and scalable parallel backends, enabling multi-turn ARLT across six domains. Empirical results show VerlTool achieves competitive performance with specialized systems while offering a cohesive training infrastructure and insights into tool-usage dynamics and emerging agentic behaviors. The open-source release aims to catalyze community adoption and rapid experimentation in tool-augmented RL research.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated success in enhancing LLM reasoning capabilities, but remains limited to single-turn interactions without tool integration. While recent Agentic Reinforcement Learning with Tool use (ARLT) approaches have emerged to address multi-turn tool interactions, existing works develop task-specific codebases that suffer from fragmentation, synchronous execution bottlenecks, and limited extensibility across domains. These inefficiencies hinder broader community adoption and algorithmic innovation. We introduce VerlTool, a unified and modular framework that addresses these limitations through systematic design principles. VerlTool provides four key contributions: (1) upstream alignment with VeRL ensuring compatibility and simplified maintenance, (2) unified tool management via standardized APIs supporting diverse modalities including code execution, search, SQL databases, and vision processing, (3) asynchronous rollout execution achieving near 2$\times$ speedup by eliminating synchronization bottlenecks, and (4) comprehensive evaluation demonstrating competitive performance across 6 ARLT domains. Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms. We train and evaluate models on mathematical reasoning, knowledge QA, SQL generation, visual reasoning, web search, and software engineering tasks, achieving results comparable to specialized systems while providing unified training infrastructure. The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions, significantly reducing development overhead and providing a scalable foundation for tool-augmented RL research. Our code is open-sourced at https://github.com/TIGER-AI-Lab/verl-tool.

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

TL;DR

VerlTool tackles key bottlenecks in Agentic Reinforcement Learning with Tool Use by delivering a unified, modular framework that decouples RL training from tool execution, supports diverse multimodal tools, and enables asynchronous rollouts for efficiency. Grounded in VeRL alignment, VerlTool provides a standardized tool server API, a plug-in architecture for rapid tool integration, and scalable parallel backends, enabling multi-turn ARLT across six domains. Empirical results show VerlTool achieves competitive performance with specialized systems while offering a cohesive training infrastructure and insights into tool-usage dynamics and emerging agentic behaviors. The open-source release aims to catalyze community adoption and rapid experimentation in tool-augmented RL research.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated success in enhancing LLM reasoning capabilities, but remains limited to single-turn interactions without tool integration. While recent Agentic Reinforcement Learning with Tool use (ARLT) approaches have emerged to address multi-turn tool interactions, existing works develop task-specific codebases that suffer from fragmentation, synchronous execution bottlenecks, and limited extensibility across domains. These inefficiencies hinder broader community adoption and algorithmic innovation. We introduce VerlTool, a unified and modular framework that addresses these limitations through systematic design principles. VerlTool provides four key contributions: (1) upstream alignment with VeRL ensuring compatibility and simplified maintenance, (2) unified tool management via standardized APIs supporting diverse modalities including code execution, search, SQL databases, and vision processing, (3) asynchronous rollout execution achieving near 2 speedup by eliminating synchronization bottlenecks, and (4) comprehensive evaluation demonstrating competitive performance across 6 ARLT domains. Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms. We train and evaluate models on mathematical reasoning, knowledge QA, SQL generation, visual reasoning, web search, and software engineering tasks, achieving results comparable to specialized systems while providing unified training infrastructure. The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions, significantly reducing development overhead and providing a scalable foundation for tool-augmented RL research. Our code is open-sourced at https://github.com/TIGER-AI-Lab/verl-tool.

Paper Structure

This paper contains 46 sections, 11 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Overview of the VerlTool, a modularized and efficient framework for the Agentic Reinforcement Learning with Tool Use (ARLT) training paradigm, where the RL workflow and tool execution are fully disaggregated for both efficiency and extensibility.
  • Figure 2: Visualization of the Async Rollout pipeline design and its effect in saving time.
  • Figure 3: Example of code design for adding a new tool in VerlTool via the plugin interface.
  • Figure 4: Tokenization of LLM generated content "...</python>" and tool observation "\\ n<result>..." can produce different token lists using Qwen2.5 tokenizer under different strategies.
  • Figure 5: Training dynamics using VerlTool on all 6 tasks. For each task, the corresponding test benchmarks are AIME24, NQ, Spider-Test, VStar, GAIA, and SWE-Verified. All models are trained and evaluated based on VerlTool framework. The actual evaluation performance (purple dash) can be higher due to the train-eval settings difference. The number of actions is averaged over all sampled responses in each batch.