Table of Contents
Fetching ...

AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning

Yifan Wei, Xiaoyan Yu, Yixuan Weng, Tengfei Pan, Angsheng Li, Li Du

TL;DR

AutoTIR addresses the challenge of balancing external tool use with core language abilities in reasoning tasks. It uses reinforcement learning with a hybrid reward to enable an LLM to autonomously decide whether and which tools to invoke, while preserving instruction-following competence. The method employs GRPO and a multi-tool inference protocol, demonstrating superior performance and generalization across knowledge-intensive, mathematical, and open-domain benchmarks. These results suggest a scalable, generalizable approach for tool-augmented reasoning in large language models.

Abstract

Large Language Models (LLMs), when enhanced through reasoning-oriented post-training, evolve into powerful Large Reasoning Models (LRMs). Tool-Integrated Reasoning (TIR) further extends their capabilities by incorporating external tools, but existing methods often rely on rigid, predefined tool-use patterns that risk degrading core language competence. Inspired by the human ability to adaptively select tools, we introduce AutoTIR, a reinforcement learning framework that enables LLMs to autonomously decide whether and which tool to invoke during the reasoning process, rather than following static tool-use strategies. AutoTIR leverages a hybrid reward mechanism that jointly optimizes for task-specific answer correctness, structured output adherence, and penalization of incorrect tool usage, thereby encouraging both precise reasoning and efficient tool integration. Extensive evaluations across diverse knowledge-intensive, mathematical, and general language modeling tasks demonstrate that AutoTIR achieves superior overall performance, significantly outperforming baselines and exhibits superior generalization in tool-use behavior. These results highlight the promise of reinforcement learning in building truly generalizable and scalable TIR capabilities in LLMs. The code and data are available at https://github.com/weiyifan1023/AutoTIR.

AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning

TL;DR

AutoTIR addresses the challenge of balancing external tool use with core language abilities in reasoning tasks. It uses reinforcement learning with a hybrid reward to enable an LLM to autonomously decide whether and which tools to invoke, while preserving instruction-following competence. The method employs GRPO and a multi-tool inference protocol, demonstrating superior performance and generalization across knowledge-intensive, mathematical, and open-domain benchmarks. These results suggest a scalable, generalizable approach for tool-augmented reasoning in large language models.

Abstract

Large Language Models (LLMs), when enhanced through reasoning-oriented post-training, evolve into powerful Large Reasoning Models (LRMs). Tool-Integrated Reasoning (TIR) further extends their capabilities by incorporating external tools, but existing methods often rely on rigid, predefined tool-use patterns that risk degrading core language competence. Inspired by the human ability to adaptively select tools, we introduce AutoTIR, a reinforcement learning framework that enables LLMs to autonomously decide whether and which tool to invoke during the reasoning process, rather than following static tool-use strategies. AutoTIR leverages a hybrid reward mechanism that jointly optimizes for task-specific answer correctness, structured output adherence, and penalization of incorrect tool usage, thereby encouraging both precise reasoning and efficient tool integration. Extensive evaluations across diverse knowledge-intensive, mathematical, and general language modeling tasks demonstrate that AutoTIR achieves superior overall performance, significantly outperforming baselines and exhibits superior generalization in tool-use behavior. These results highlight the promise of reinforcement learning in building truly generalizable and scalable TIR capabilities in LLMs. The code and data are available at https://github.com/weiyifan1023/AutoTIR.

Paper Structure

This paper contains 23 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: AutoTIR balances the tool-integrated reasoning with instruction following ability.
  • Figure 2: Overall framework of AutoTIR. Top: Comparison between AutoTIR and existing paradigms (fixed reasoning strategy vs. autonomous decision). Bottom: GRPO training pipeline that incorporates multiple reasoning actions.
  • Figure 3: Avg. reward score and response length during training.
  • Figure 4: Model Performance and Tool Advantage Across Reasoning Task Types.
  • Figure 5: System prompt template for training and inference from AutoTIR.
  • ...and 3 more figures