Table of Contents
Fetching ...

Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning

Zhiwen Ruan, Yixia Li, He Zhu, Yun Chen, Peng Li, Yang Liu, Guanhua Chen

TL;DR

This paper tackles the inefficiency of uniform token-level supervision in supervised fine-tuning of large language models for reasoning tasks. It introduces Critical Token Fine-tuning (CFT), which identifies and updates only tokens that are functionally indispensable for correctness via counterfactual perturbations, preserving diversity at non-critical positions. Across five model families and eleven benchmarks, CFT consistently outperforms standard SFT, often updating fewer than $12\%$ of tokens, and enhances inference-time diversity (Pass@N) and RL initialization by sustaining higher exploration. The approach generalizes beyond mathematics, demonstrated on medical QA, and offers a practical, general framework for efficient, robust fine-tuning with broad applicability.

Abstract

Large language models (LLMs) primarily rely on supervised fine-tuning (SFT) as a key method to adapt pre-trained models to domain-specific tasks such as mathematical reasoning. However, standard SFT uniformly penalizes all tokens, neglecting that only a small subset of critical tokens determines reasoning correctness. This uniform supervision often causes reduced output diversity and limited generalization. We propose Critical Token Fine-tuning (CFT), a simple yet effective approach that updates only tokens identified as functionally indispensable via counterfactual perturbations. By focusing gradient signals on these decisive reasoning steps while preserving the diversity of non-critical tokens, CFT can enhance both generation and diversity. Extensive experiments on five models across three families (Qwen, OLMo, LLaMA) and eleven mathematical reasoning benchmarks show that CFT, despite fine-tuning on less than 12% of tokens, consistently outperforms standard SFT. Moreover, CFT enables test-time scaling through improved sampling diversity and provides a stronger initialization for reinforcement learning, sustaining performance gains in later training stages while maintaining higher entropy for better exploration. These results highlight CFT as a practical and general framework for efficient and robust LLM fine-tuning.

Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning

TL;DR

This paper tackles the inefficiency of uniform token-level supervision in supervised fine-tuning of large language models for reasoning tasks. It introduces Critical Token Fine-tuning (CFT), which identifies and updates only tokens that are functionally indispensable for correctness via counterfactual perturbations, preserving diversity at non-critical positions. Across five model families and eleven benchmarks, CFT consistently outperforms standard SFT, often updating fewer than of tokens, and enhances inference-time diversity (Pass@N) and RL initialization by sustaining higher exploration. The approach generalizes beyond mathematics, demonstrated on medical QA, and offers a practical, general framework for efficient, robust fine-tuning with broad applicability.

Abstract

Large language models (LLMs) primarily rely on supervised fine-tuning (SFT) as a key method to adapt pre-trained models to domain-specific tasks such as mathematical reasoning. However, standard SFT uniformly penalizes all tokens, neglecting that only a small subset of critical tokens determines reasoning correctness. This uniform supervision often causes reduced output diversity and limited generalization. We propose Critical Token Fine-tuning (CFT), a simple yet effective approach that updates only tokens identified as functionally indispensable via counterfactual perturbations. By focusing gradient signals on these decisive reasoning steps while preserving the diversity of non-critical tokens, CFT can enhance both generation and diversity. Extensive experiments on five models across three families (Qwen, OLMo, LLaMA) and eleven mathematical reasoning benchmarks show that CFT, despite fine-tuning on less than 12% of tokens, consistently outperforms standard SFT. Moreover, CFT enables test-time scaling through improved sampling diversity and provides a stronger initialization for reinforcement learning, sustaining performance gains in later training stages while maintaining higher entropy for better exploration. These results highlight CFT as a practical and general framework for efficient and robust LLM fine-tuning.

Paper Structure

This paper contains 36 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Identifying critical tokens in CFT via counterfactual perturbation. A token is deemed non-critical if substituting it maintains correctness, indicating it is replaceable (e.g., calculate$\rightarrow$determine, green path). It is critical if if the substitution causes an incorrect answer (e.g., consider$\rightarrow$break, red path).
  • Figure 2: Pass@N comparison between CFT (solid lines) and SFT (dashed lines) across multiple backbones. Note that the vertical axis range is narrower in (a) than in (b).
  • Figure 3: Reinforcement learning analysis. CFT-initialized models (Red Line) maintain higher entropy and achieve superior RL performance.
  • Figure 4: (a) Performance of SFT and CFT when incorporating sampled correct responses. (b) Performance of CFT with critical tokens identified from offline responses. Greedy Data SFT and Greedy Data CFT denote training on each model’s own greedy responses (from Section \ref{['sec:results']}).
  • Figure 5: Distributions of token categories for critical vs. normal tokens across three model families. Categories include numbers, operators (e.g., "+", "-"); punctuation (e.g., ".", "," , "?"); special characters (e.g., "$", "#", "@"); words; and others. Each bar shows the share of a category within either the critical or normal set.
  • ...and 1 more figures