Table of Contents
Fetching ...

Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning

Ningning Xu, Yuxuan Jiang, Shubhashis Roy Dipta

TL;DR

This work addresses pattern mismatch in tool-integrated reasoning for large reasoning models by proposing a two-stage learning framework. Stage 1 builds code competence from both calculator-style and algorithmic-pattern data, while Stage 2 aligns pattern choice with teacher signals using Direct Preference Optimization. Empirical results on MATH500 and AIME24 show substantial gains in Code@1 and overall accuracy, illustrating the importance of not only when to use tools but how. The pattern-aware approach improves robustness and opens avenues for more reliable, executable reasoning in complex domains.

Abstract

Tool-integrated reasoning (TIR) has become a key approach for improving large reasoning models (LRMs) on complex problems. Prior work has mainly studied when to invoke tools, while overlooking how tools are applied. We identify two common patterns: a calculator pattern that uses code for direct computation, and an algorithmic pattern that encodes problems as programs. Misaligned choices often cause failures even when reasoning is sound. We propose a two-stage framework that first builds code competence from both patterns and then aligns pattern selection with teacher preferences. Across challenging math datasets, our pattern-aware method substantially improves both code usage and accuracy, for instance raising Code@1 on MATH500 from 64.0% to 70.5% and on AIME24 from 26.7% to 50.0%. These gains highlight the effectiveness of a pattern-aware approach for tool-integrated reasoning.

Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning

TL;DR

This work addresses pattern mismatch in tool-integrated reasoning for large reasoning models by proposing a two-stage learning framework. Stage 1 builds code competence from both calculator-style and algorithmic-pattern data, while Stage 2 aligns pattern choice with teacher signals using Direct Preference Optimization. Empirical results on MATH500 and AIME24 show substantial gains in Code@1 and overall accuracy, illustrating the importance of not only when to use tools but how. The pattern-aware approach improves robustness and opens avenues for more reliable, executable reasoning in complex domains.

Abstract

Tool-integrated reasoning (TIR) has become a key approach for improving large reasoning models (LRMs) on complex problems. Prior work has mainly studied when to invoke tools, while overlooking how tools are applied. We identify two common patterns: a calculator pattern that uses code for direct computation, and an algorithmic pattern that encodes problems as programs. Misaligned choices often cause failures even when reasoning is sound. We propose a two-stage framework that first builds code competence from both patterns and then aligns pattern selection with teacher preferences. Across challenging math datasets, our pattern-aware method substantially improves both code usage and accuracy, for instance raising Code@1 on MATH500 from 64.0% to 70.5% and on AIME24 from 26.7% to 50.0%. These gains highlight the effectiveness of a pattern-aware approach for tool-integrated reasoning.

Paper Structure

This paper contains 15 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Pattern mismatch in tool-integrated reasoning. Left: in $1000! \div (800! \times 2!)$, an algebraic approach succeeds while direct computation overflows. Right: in finding the first 10-digit prime in $\pi$, a symbolic approach fails from context limits while a scanning approach succeeds. These cases show that success depends on the chosen tool-use pattern.
  • Figure 2: Two-stage training framework. In the 1st phase, each problem is expanded into both calculator- and algorithmic-style solutions ($2N$ data) to build code competence, and in the 2nd phase, teacher-provided preferences on $N$ problems guide the student to become pattern-aware.