Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models
Shuo Nie, Hexuan Deng, Chao Wang, Ruiyu Fang, Xuebo Liu, Shuangyong Song, Yu Li, Min Zhang, Xuelong Li
TL;DR
This work tackles faithfulness hallucinations in small reasoning models (SRMs) during chain-of-thought reasoning by highlighting that outcome-based rewards fail to penalize unfaithful intermediate steps. It proposes FaithRL, a reinforcement learning framework that combines implicit step-level rewards via Dynamic Truncated Resampling (DTR) with explicit step-level rewards derived from a Process Reward Model (PRM) to penalize unfaithful CoT steps while rewarding faithful prefixes. Empirical results across multiple SRMs and Open-Book QA benchmarks show FaithRL consistently reduces faithfulness hallucinations in both CoT and final answers, outperforming baselines while maintaining training efficiency. The approach offers a practical path toward more trustworthy and reliable SRMs by jointly optimizing task accuracy and reasoning faithfulness, with code available for reproducibility at the provided repository URL.
Abstract
As large language models become smaller and more efficient, small reasoning models (SRMs) are crucial for enabling chain-of-thought (CoT) reasoning in resource-constrained settings. However, they are prone to faithfulness hallucinations, especially in intermediate reasoning steps. Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained CoT evaluation, which can inadvertently reinforce unfaithful reasoning when the final answer is correct. To address these limitations, we propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model, together with an implicit truncated resampling strategy that generates contrastive signals from faithful prefixes. Experiments across multiple SRMs and Open-Book QA benchmarks demonstrate that FaithRL consistently reduces hallucinations in both the CoT and final answers, leading to more faithful and reliable reasoning. Code is available at https://github.com/Easy195/FaithRL.
