ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools

Quy Minh Le; Minh Sao Khue Luu; Khanh-Tung Tran; Duc-Hai Nguyen; Hoang-Quoc-Viet Pham; Quan Le; Hoang Thanh Lam; Hoang D. Nguyen

ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools

Quy Minh Le, Minh Sao Khue Luu, Khanh-Tung Tran, Duc-Hai Nguyen, Hoang-Quoc-Viet Pham, Quan Le, Hoang Thanh Lam, Hoang D. Nguyen

TL;DR

ToolBrain tackles the challenge of enabling agentic AI to use tools by delivering a flexible reinforcement learning framework that supports multiple learning strategies ($GRPO$ and $DPO$) and a hybrid reward system combining user-defined signals with LLM-based judgments. Its Coach–Athlete paradigm separates high-level orchestration from task execution, aided by an Adapter that produces rich execution traces for RL feedback. The framework integrates features such as ToolRetriever for intelligent tool selection, zero-learning data generation, knowledge distillation, and efficient fine-tuning via Unsloth/QLoRA/BitsAndBytes, all demonstrated on an Email Search task with substantial gains over baselines. The results show faster convergence and robust tool-use improvements, highlighting ToolBrain’s potential to lower the barrier to deploying domain-adapted, tool-using agents in resource-constrained settings. Overall, ToolBrain offers a practical, extensible path for researchers and practitioners to rapidly develop and deploy capable agentic systems with configurable rewards and tooling.

Abstract

Effective tool use is essential for agentic AI, yet training agents to utilize tools remains challenging due to manually designed rewards, limited training data, and poor multi-tool selection, resulting in slow adaptation, wasted computational resources, and suboptimal performance. We introduce ToolBrain, a lightweight and user-friendly framework for coaching tool use in agentic models with flexible reinforcement learning (RL), easing the barriers for researchers and practitioners to adapt LLM-based agents to specific domains. It supports a wide range of training strategies, including RL algorithms such as GRPO and DPO, as well as supervised learning. ToolBrain enables custom reward callables directly on an agent's execution traces or simply utilizes an automated LLM-as-a-judge system for reward generation. It is packed with useful capabilities, including knowledge distillation from large to small models for efficient development, automatic task generation from tool descriptions, seamless tool retrieval, efficient fine-tuning pipelines with QLoRA through Unsloth, and quantized inference via bitsandbytes. We demonstrate ToolBrain through diverse use cases, such as training a CodeAct agent to autonomously execute email search tasks, showing fast, targeted improvements (up to 30.0%) in tool-use skills while keeping the codebase simple and extensible in Agentic AI. Our framework is publicly available at https://toolbrain.org.

ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools

TL;DR

Abstract

ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)