Table of Contents
Fetching ...

Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations

Eunkyu Park, Wesley Hanwen Deng, Vasudha Varadarajan, Mingxi Yan, Gunhee Kim, Maarten Sap, Motahhare Eslami

TL;DR

The paper investigates how chain-of-thought explanations influence trust and error detection in multimodal moral reasoning, revealing that explanations can both clarify and mislead. It introduces a perturbation-based framework that manipulates reasoning correctness (omissions, contradictions, hallucinations) and delivery tone (hedged, neutral, confident) in vision-language models using MORALISE image-text scenarios. A three-pronged trust calibration approach (error detection, agreement, and self-reported trust) is combined with model-side error profiling across open- and closed-source VLMs to map prevalence and detectability gaps. The findings show that users often rely on outcome agreement, with confident tones suppressing error scrutiny, underscoring the need for explanation interfaces that foster critical examination rather than blind trust.

Abstract

Explanations are often promoted as tools for transparency, but they can also foster confirmation bias; users may assume reasoning is correct whenever outputs appear acceptable. We study this double-edged role of Chain-of-Thought (CoT) explanations in multimodal moral scenarios by systematically perturbing reasoning chains and manipulating delivery tones. Specifically, we analyze reasoning errors in vision language models (VLMs) and how they impact user trust and the ability to detect errors. Our findings reveal two key effects: (1) users often equate trust with outcome agreement, sustaining reliance even when reasoning is flawed, and (2) the confident tone suppresses error detection while maintaining reliance, showing that delivery styles can override correctness. These results highlight how CoT explanations can simultaneously clarify and mislead, underscoring the need for NLP systems to provide explanations that encourage scrutiny and critical thinking rather than blind trust. All code will be released publicly.

Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations

TL;DR

The paper investigates how chain-of-thought explanations influence trust and error detection in multimodal moral reasoning, revealing that explanations can both clarify and mislead. It introduces a perturbation-based framework that manipulates reasoning correctness (omissions, contradictions, hallucinations) and delivery tone (hedged, neutral, confident) in vision-language models using MORALISE image-text scenarios. A three-pronged trust calibration approach (error detection, agreement, and self-reported trust) is combined with model-side error profiling across open- and closed-source VLMs to map prevalence and detectability gaps. The findings show that users often rely on outcome agreement, with confident tones suppressing error scrutiny, underscoring the need for explanation interfaces that foster critical examination rather than blind trust.

Abstract

Explanations are often promoted as tools for transparency, but they can also foster confirmation bias; users may assume reasoning is correct whenever outputs appear acceptable. We study this double-edged role of Chain-of-Thought (CoT) explanations in multimodal moral scenarios by systematically perturbing reasoning chains and manipulating delivery tones. Specifically, we analyze reasoning errors in vision language models (VLMs) and how they impact user trust and the ability to detect errors. Our findings reveal two key effects: (1) users often equate trust with outcome agreement, sustaining reliance even when reasoning is flawed, and (2) the confident tone suppresses error detection while maintaining reliance, showing that delivery styles can override correctness. These results highlight how CoT explanations can simultaneously clarify and mislead, underscoring the need for NLP systems to provide explanations that encourage scrutiny and critical thinking rather than blind trust. All code will be released publicly.

Paper Structure

This paper contains 37 sections, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Overview of our user study. We ask whether users trust a model’s judgment given its reasoning chain. Trust depends on two determinants-reasoning correctness and delivery tone-and is analyzed via three dimensions: error detection (noticing flaws in the reasoning), agreement (endorsing the model’s judgment), and self- reported trust (expressed confidence).
  • Figure 2: (a) In our study, participants evaluate multimodal moral judgments generated by VLMs. Each trial presents an image–scenario pair, a model-produced chain-of-thought, and a moral judgment. (b) Building on analysis of LLM reasoning, we introduce three recurrent failure patterns as perturbations of otherwise clean chains: omissions, contradictions, and hallucinations. These manipulations respectively capture incompleteness, inconsistency, and ungrounded invention in model reasoning.
  • Figure 3: Reliance outcomes across error types. (a) Error detection, (b) agreement with the model’s moral judgment, and (c) self-reported trust in the model’s reasoning across error type and correctness.
  • Figure 4: Error detection rates by tones for reasoning chains generated for correct (top) and incorrect (bottom) judgements.
  • Figure 5: In-the-wild error distribution by models. Stacked bar plots of reasoning chain perturbations across six VLMs.
  • ...and 10 more figures