Table of Contents
Fetching ...

Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Xin Xu, Yan Xu, Tianhao Chen, Yuchen Yan, Chengwu Liu, Zaoyu Chen, Yufei Wang, Yichun Yin, Yasheng Wang, Lifeng Shang, Qun Liu, Lu Yin

TL;DR

The paper addresses whether LLMs can autonomously adapt their reasoning strategy between Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR) for mathematical problems. It introduces TATA, an aptitude-aware data-selection framework that tailors supervised fine-tuning data by measuring base-LLM performance on anchor-based CoT and TIR prompts, enabling test-time strategy switching. Across six math benchmarks, TATA achieves competitive or superior accuracy with improved inference efficiency relative to TIR alone, demonstrating the value of data-driven aptitude alignment. The work further analyzes aptitudebased data selection, transferability between models, and potential reinforcement-learning extensions, highlighting practical implications for scalable, adaptive reasoning in LLMs.

Abstract

Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. While efforts have been made to combine these methods, they primarily rely on post-selection or predefined strategies, leaving an open question: whether LLMs can autonomously adapt their reasoning strategy based on their inherent capabilities. In this work, we propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously, aligning it with their intrinsic aptitude. TATA incorporates base-LLM-aware data selection during supervised fine-tuning (SFT) to tailor training data to the model's unique abilities. This approach equips LLMs to autonomously determine and apply the appropriate reasoning strategy at test time. We evaluate TATA through extensive experiments on six mathematical reasoning benchmarks, using both general-purpose and math-specialized LLMs. Empirical results demonstrate that TATA effectively combines the complementary strengths of CoT and TIR, achieving superior or comparable performance with improved inference efficiency compared to TIR alone. Further analysis underscores the critical role of aptitude-aware data selection in enabling LLMs to make effective and adaptive reasoning decisions and align reasoning strategies with model capabilities.

Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

TL;DR

The paper addresses whether LLMs can autonomously adapt their reasoning strategy between Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR) for mathematical problems. It introduces TATA, an aptitude-aware data-selection framework that tailors supervised fine-tuning data by measuring base-LLM performance on anchor-based CoT and TIR prompts, enabling test-time strategy switching. Across six math benchmarks, TATA achieves competitive or superior accuracy with improved inference efficiency relative to TIR alone, demonstrating the value of data-driven aptitude alignment. The work further analyzes aptitudebased data selection, transferability between models, and potential reinforcement-learning extensions, highlighting practical implications for scalable, adaptive reasoning in LLMs.

Abstract

Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. While efforts have been made to combine these methods, they primarily rely on post-selection or predefined strategies, leaving an open question: whether LLMs can autonomously adapt their reasoning strategy based on their inherent capabilities. In this work, we propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously, aligning it with their intrinsic aptitude. TATA incorporates base-LLM-aware data selection during supervised fine-tuning (SFT) to tailor training data to the model's unique abilities. This approach equips LLMs to autonomously determine and apply the appropriate reasoning strategy at test time. We evaluate TATA through extensive experiments on six mathematical reasoning benchmarks, using both general-purpose and math-specialized LLMs. Empirical results demonstrate that TATA effectively combines the complementary strengths of CoT and TIR, achieving superior or comparable performance with improved inference efficiency compared to TIR alone. Further analysis underscores the critical role of aptitude-aware data selection in enabling LLMs to make effective and adaptive reasoning decisions and align reasoning strategies with model capabilities.

Paper Structure

This paper contains 63 sections, 18 equations, 11 figures, 12 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of our research question. (a) automatictoolselect2023zhao post-select between CoT and TIR by another LLM. (b) mammoth2023yue choose CoT if TIR fails due to syntax error or execution timeout. (c) qwen252024Yang controls the selection between CoT and TIR by predefined inference prompts. (d) We aim to teach LLMs to choose the appropriate one spontaneously according to their aptitude.
  • Figure 2: Overview of our Teaching LLMs According to Their Aptitude (TATA) framework. Here, $\mathcal{D}_{\text{orig}}$ denotes the original training set, $\mathcal{D}_{\text{aug}}$ represents the augmented training set obtained through rejection sampling with CoT only, and $\mathcal{D}$ refers to the candidate set consisting of (query, CoT, TIR) triplets. $\mathcal{D}_{\text{anchor}}$ is the anchor set of size $A$. ${S_{\text{CoT}}^k}$ and $S_{\text{TIR}}^k$ are scores calculated based on the LLMs' aptitude on the anchor set, elicited using 1-shot prompts. Finally, $\mathcal{H}$ represents the SFT data selection process. Fine-tuning on the resulting SFT data enables LLMs to spontaneously select between CoT and TIR at test time according to their aptitude.
  • Figure 3: The distribution of ($S_{\text{CoT}}^k - S_{\text{TIR}}^k$): Qwen2.5-0.5B (left), Qwen2.5-7B (middle), LLaMA-3-8B (right).
  • Figure 4: The distribution of ${S_{\text{CoT}}^k}$ (left), $S_{\text{TIR}}^k$ (middle), and ($S_{\text{CoT}}^k - S_{\text{TIR}}^k$) (right) for LLaMA-3-8B.
  • Figure 5: The distribution of ${S_{\text{CoT}}^k}$ (left), $S_{\text{TIR}}^k$ (middle), and ($S_{\text{CoT}}^k - S_{\text{TIR}}^k$) (right) for Qwen2.5-0.5B.
  • ...and 6 more figures