Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
Ningning Xu, Yuxuan Jiang, Shubhashis Roy Dipta
TL;DR
This work addresses pattern mismatch in tool-integrated reasoning for large reasoning models by proposing a two-stage learning framework. Stage 1 builds code competence from both calculator-style and algorithmic-pattern data, while Stage 2 aligns pattern choice with teacher signals using Direct Preference Optimization. Empirical results on MATH500 and AIME24 show substantial gains in Code@1 and overall accuracy, illustrating the importance of not only when to use tools but how. The pattern-aware approach improves robustness and opens avenues for more reliable, executable reasoning in complex domains.
Abstract
Tool-integrated reasoning (TIR) has become a key approach for improving large reasoning models (LRMs) on complex problems. Prior work has mainly studied when to invoke tools, while overlooking how tools are applied. We identify two common patterns: a calculator pattern that uses code for direct computation, and an algorithmic pattern that encodes problems as programs. Misaligned choices often cause failures even when reasoning is sound. We propose a two-stage framework that first builds code competence from both patterns and then aligns pattern selection with teacher preferences. Across challenging math datasets, our pattern-aware method substantially improves both code usage and accuracy, for instance raising Code@1 on MATH500 from 64.0% to 70.5% and on AIME24 from 26.7% to 50.0%. These gains highlight the effectiveness of a pattern-aware approach for tool-integrated reasoning.
