Table of Contents
Fetching ...

Seeing the Reasoning: How LLM Rationales Influence User Trust and Decision-Making in Factual Verification Tasks

Xin Sun, Shu Wei, Jos A Bosch, Isao Echizen, Saku Sugawara, Abdallah El Ali

TL;DR

It is found that correct rationales and certainty cues increased trust, decision confidence, and AI advice adoption, whereas uncertainty cues reduced them, suggesting users were less sensitive to how reasoning was revealed than to its reliability.

Abstract

Large Language Models (LLMs) increasingly show reasoning rationales alongside their answers, turning "reasoning" into a user-interface element. While step-by-step rationales are typically associated with model performance, how they influence users' trust and decision-making in factual verification tasks remains unclear. We ran an online study (N=68) manipulating three properties of LLM reasoning rationales: presentation format (instant vs. delayed vs. on-demand), correctness (correct vs. incorrect), and certainty framing (none vs. certain vs. uncertain). We found that correct rationales and certainty cues increased trust, decision confidence, and AI advice adoption, whereas uncertainty cues reduced them. Presentation format did not have a significant effect, suggesting users were less sensitive to how reasoning was revealed than to its reliability. Participants indicated they use rationales to primarily audit outputs and calibrate trust, where they expected rationales in stepwise, adaptive forms with certainty indicators. Our work shows that user-facing rationales, if poorly designed, can both support decision-making yet miscalibrate trust.

Seeing the Reasoning: How LLM Rationales Influence User Trust and Decision-Making in Factual Verification Tasks

TL;DR

It is found that correct rationales and certainty cues increased trust, decision confidence, and AI advice adoption, whereas uncertainty cues reduced them, suggesting users were less sensitive to how reasoning was revealed than to its reliability.

Abstract

Large Language Models (LLMs) increasingly show reasoning rationales alongside their answers, turning "reasoning" into a user-interface element. While step-by-step rationales are typically associated with model performance, how they influence users' trust and decision-making in factual verification tasks remains unclear. We ran an online study (N=68) manipulating three properties of LLM reasoning rationales: presentation format (instant vs. delayed vs. on-demand), correctness (correct vs. incorrect), and certainty framing (none vs. certain vs. uncertain). We found that correct rationales and certainty cues increased trust, decision confidence, and AI advice adoption, whereas uncertainty cues reduced them. Presentation format did not have a significant effect, suggesting users were less sensitive to how reasoning was revealed than to its reliability. Participants indicated they use rationales to primarily audit outputs and calibrate trust, where they expected rationales in stepwise, adaptive forms with certainty indicators. Our work shows that user-facing rationales, if poorly designed, can both support decision-making yet miscalibrate trust.
Paper Structure (22 sections, 2 figures, 3 tables)

This paper contains 22 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: (a) Study procedure and design: participants were assigned to one rationale presentation format (Instant/Delayed/On-demand; between-subjects) and completed six trials covering all combinations of correctness (correct/incorrect) and certainty cue (none/certain/uncertain; within-subjects), with counterbalanced order. (b) Web interface: the query, LLM answer, and rationale were shown according to the assigned format, with the certainty cue when applicable.
  • Figure 2: Main effects of certainty cues and rationale correctness. In each sub-figure, the left panel shows the certainty-cue effect and the right panel shows the correctness effect. The y-axis shows the mean rating for the corresponding measure (Likert-scale score for trust and confidence; proportion for advice adoption decision). Bars are means; half-violins show distributions; The upper-left of each sub-figure $(F,p)$ are ANOVA results; brackets are post-hoc comparison tests (*$p<.05$, **$p<.01$, ***$p<.001$).