Table of Contents
Fetching ...

Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning

Lexiang Tang, Weihao Gao, Bingchen Zhao, Lu Ma, Qiao jin, Bang Yang, Yuexian Zou

TL;DR

This work proposes Thinking by Subtraction, a confidence-driven contrastive decoding approach that improves reasoning reliability through targeted token-level intervention that significantly improves accuracy across mathematical reasoning benchmarks while substantially reducing output length.

Abstract

Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of low-confidence tokens disproportionately contributes to reasoning errors and unnecessary output expansion. Motivated by this observation, we propose Thinking by Subtraction, a confidence-driven contrastive decoding approach that improves reasoning reliability through targeted token-level intervention. Our method, Confidence-Driven Contrastive Decoding, detects low-confidence tokens during decoding and intervenes selectively at these positions. It constructs a contrastive reference by replacing high-confidence tokens with minimal placeholders, and refines predictions by subtracting this reference distribution at low-confidence locations. Experiments show that CCD significantly improves accuracy across mathematical reasoning benchmarks while substantially reducing output length, with minimal KV-cache overhead. As a training-free method, CCD enhances reasoning reliability through targeted low-confidence intervention without computational redundancy. Our code will be made available at: https://github.com/bolo-web/CCD.

Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning

TL;DR

This work proposes Thinking by Subtraction, a confidence-driven contrastive decoding approach that improves reasoning reliability through targeted token-level intervention that significantly improves accuracy across mathematical reasoning benchmarks while substantially reducing output length.

Abstract

Recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness. However, prior studies show that reasoning uncertainty is highly localized: a small subset of low-confidence tokens disproportionately contributes to reasoning errors and unnecessary output expansion. Motivated by this observation, we propose Thinking by Subtraction, a confidence-driven contrastive decoding approach that improves reasoning reliability through targeted token-level intervention. Our method, Confidence-Driven Contrastive Decoding, detects low-confidence tokens during decoding and intervenes selectively at these positions. It constructs a contrastive reference by replacing high-confidence tokens with minimal placeholders, and refines predictions by subtracting this reference distribution at low-confidence locations. Experiments show that CCD significantly improves accuracy across mathematical reasoning benchmarks while substantially reducing output length, with minimal KV-cache overhead. As a training-free method, CCD enhances reasoning reliability through targeted low-confidence intervention without computational redundancy. Our code will be made available at: https://github.com/bolo-web/CCD.
Paper Structure (39 sections, 15 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 39 sections, 15 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Trajectory-level relationship between token confidence and final answer correctness. Across 2,880 reasoning trajectories generated by Qwen3-8B on AIME24.
  • Figure 2: Overview of Confidence-Driven Contrastive Decoding (CCD). The decoding process consists of four key components: (1) online estimation of token-level confidence; (2) confidence-driven token selection that partitions chain-of-thought tokens into low-confidence (LC) and high-confidence (HC) sets; (3) contrastive decoding triggered at LC-CoT tokens to refine uncertain predictions; and (4) dual key--value (KV) cache maintenance to support selective intervention without disrupting standard autoregressive decoding.
  • Figure 3: Ablation on different replacement intervals
  • Figure 4: Ablation on using different special token.
  • Figure 6: Token-level confidence improvement at low-confidence decoding positions.
  • ...and 5 more figures