Table of Contents
Fetching ...

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

Yuhui Wang, Changjiang Li, Guangke Chen, Jiacheng Liang, Ting Wang

TL;DR

FARL is introduced, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning and enhances generalizable reasoning capabilities by carefully suppressing retrieval shortcuts during the fine-tuning process.

Abstract

Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively "hacking" the reward signal and undermining genuine reasoning development. To address this challenge, we introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning. By carefully suppressing retrieval shortcuts during the fine-tuning process, FARL promotes reasoning-dominant behavior and enhances generalizable reasoning capabilities. The code is available: https://github.com/ZJUWYH/FARL.

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

TL;DR

FARL is introduced, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning and enhances generalizable reasoning capabilities by carefully suppressing retrieval shortcuts during the fine-tuning process.

Abstract

Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively "hacking" the reward signal and undermining genuine reasoning development. To address this challenge, we introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning. By carefully suppressing retrieval shortcuts during the fine-tuning process, FARL promotes reasoning-dominant behavior and enhances generalizable reasoning capabilities. The code is available: https://github.com/ZJUWYH/FARL.

Paper Structure

This paper contains 27 sections, 5 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Joint influence of reasoning and retrieval on LRM's answer generation.
  • Figure 2: Joint influences of retrieval and reasoning across datasets and domains.
  • Figure 3: Comparison of reasoning-retrieval influence (a) across datasets and domains (b) between distillation-based and RL-based models (separated by the dashed line).
  • Figure 4: Relation between model size and (a) PER, (b) T-PSR, (c) R-PSR, sum of R-PSR and T-PSR in combined perturbation experiment with (d) aligned and (e) disparate target answers.
  • Figure 5: AUC results of R1-Llama-8B on Math&Logic domain of MMLU dataset.
  • ...and 6 more figures