Table of Contents
Fetching ...

MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models

Jingyu Hu, Shu Yang, Xilin Gong, Hongming Wang, Weiru Liu, Di Wang

TL;DR

MONICA tackles the problem of sycophantic bias in Large Reasoning Models by enabling real-time monitoring and adaptive calibration during chain-of-thought reasoning. It introduces an induction-then-merge pipeline to construct a reasoning-time sycophancy dataset and trains layer-specific monitors and calibrators guided by a sycophancy drift score ($SDS$) to detect and mitigate sycophancy mid-generation. The framework segments reasoning trajectories, applies contextual windows, and adaptively adjusts calibration strength via an intervention vector, achieving improvements in both intermediate reasoning and final task performance across 12 datasets and 3 LRMs. Across cue types and models, MONICA demonstrates robust generalization and token-efficient mitigation, offering a practical approach to increase reliability of reasoning-based AI in high-stakes settings.

Abstract

Large Reasoning Models (LRMs) suffer from sycophantic behavior, where models tend to agree with users' incorrect beliefs and follow misinformation rather than maintain independent reasoning. This behavior undermines model reliability and poses societal risks. Mitigating LRM sycophancy requires monitoring how this sycophancy emerges during the reasoning trajectory; however, current methods mainly focus on judging based on final answers and correcting them, without understanding how sycophancy develops during reasoning processes. To address this limitation, we propose MONICA, a novel Monitor-guided Calibration framework that monitors and mitigates sycophancy during model inference at the level of reasoning steps, without requiring the model to finish generating its complete answer. MONICA integrates a sycophantic monitor that provides real-time monitoring of sycophantic drift scores during response generation with a calibrator that dynamically suppresses sycophantic behavior when scores exceed predefined thresholds. Extensive experiments across 12 datasets and 3 LRMs demonstrate that our method effectively reduces sycophantic behavior in both intermediate reasoning steps and final answers, yielding robust performance improvements.

MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models

TL;DR

MONICA tackles the problem of sycophantic bias in Large Reasoning Models by enabling real-time monitoring and adaptive calibration during chain-of-thought reasoning. It introduces an induction-then-merge pipeline to construct a reasoning-time sycophancy dataset and trains layer-specific monitors and calibrators guided by a sycophancy drift score () to detect and mitigate sycophancy mid-generation. The framework segments reasoning trajectories, applies contextual windows, and adaptively adjusts calibration strength via an intervention vector, achieving improvements in both intermediate reasoning and final task performance across 12 datasets and 3 LRMs. Across cue types and models, MONICA demonstrates robust generalization and token-efficient mitigation, offering a practical approach to increase reliability of reasoning-based AI in high-stakes settings.

Abstract

Large Reasoning Models (LRMs) suffer from sycophantic behavior, where models tend to agree with users' incorrect beliefs and follow misinformation rather than maintain independent reasoning. This behavior undermines model reliability and poses societal risks. Mitigating LRM sycophancy requires monitoring how this sycophancy emerges during the reasoning trajectory; however, current methods mainly focus on judging based on final answers and correcting them, without understanding how sycophancy develops during reasoning processes. To address this limitation, we propose MONICA, a novel Monitor-guided Calibration framework that monitors and mitigates sycophancy during model inference at the level of reasoning steps, without requiring the model to finish generating its complete answer. MONICA integrates a sycophantic monitor that provides real-time monitoring of sycophantic drift scores during response generation with a calibrator that dynamically suppresses sycophantic behavior when scores exceed predefined thresholds. Extensive experiments across 12 datasets and 3 LRMs demonstrate that our method effectively reduces sycophantic behavior in both intermediate reasoning steps and final answers, yielding robust performance improvements.

Paper Structure

This paper contains 31 sections, 8 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The comparison of different methods. (1) Raw LRMs misled by cues: wrong CoT and answer. (2) Current entire response-based optimization: correct answer but incorrect CoT. (3) Our MONICA: correct CoT and answer.
  • Figure 2: The Proposed Workflow of Monitor-guided Calibration Framework
  • Figure 3: Sycophantic and Non-sycophantic Patterns Extraction
  • Figure 4: $\Delta \text{RR} \Uparrow$ Relative to Without-Mitigation Performance on MMLU with DeepSeek-Llama8B
  • Figure 5: Thinking and Response Performance Comparisons on MMLU with DeepSeek-Llama8B
  • ...and 3 more figures