Table of Contents
Fetching ...

Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models

Xuyang Zhu, Sejoon Chang, Andrew Kuik

TL;DR

This work investigates how context-specific warnings can mitigate hallucinations in Retrieval-Augmented Generation (RAG) systems within an educational setting. By implementing a tailored warning system that flags potential inaccuracies at both the retrieval and generation stages, the study assesses effects on user reasoning, accuracy, and trust through a pilot quiz with history content. Results indicate that tailored warnings improve accuracy across hallucination levels and bolster trust compared to standard or no warnings, though they may introduce cognitive friction. These findings inform the design of AI-augmented learning tools that promote critical thinking and reflective decision-making in real-world, high-stakes contexts.

Abstract

Retrieval-Augmented Generation (RAG) systems offer a powerful approach to enhancing large language model (LLM) outputs by incorporating fact-checked, contextually relevant information. However, fairness and reliability concerns persist, as hallucinations can emerge at both the retrieval and generation stages, affecting users' reasoning and decision-making. Our research explores how tailored warning messages -- whose content depends on the specific context of hallucination -- shape user reasoning and actions in an educational quiz setting. Preliminary findings suggest that while warnings improve accuracy and awareness of high-level hallucinations, they may also introduce cognitive friction, leading to confusion and diminished trust in the system. By examining these interactions, this work contributes to the broader goal of AI-augmented reasoning: developing systems that actively support human reflection, critical thinking, and informed decision-making rather than passive information consumption.

Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models

TL;DR

This work investigates how context-specific warnings can mitigate hallucinations in Retrieval-Augmented Generation (RAG) systems within an educational setting. By implementing a tailored warning system that flags potential inaccuracies at both the retrieval and generation stages, the study assesses effects on user reasoning, accuracy, and trust through a pilot quiz with history content. Results indicate that tailored warnings improve accuracy across hallucination levels and bolster trust compared to standard or no warnings, though they may introduce cognitive friction. These findings inform the design of AI-augmented learning tools that promote critical thinking and reflective decision-making in real-world, high-stakes contexts.

Abstract

Retrieval-Augmented Generation (RAG) systems offer a powerful approach to enhancing large language model (LLM) outputs by incorporating fact-checked, contextually relevant information. However, fairness and reliability concerns persist, as hallucinations can emerge at both the retrieval and generation stages, affecting users' reasoning and decision-making. Our research explores how tailored warning messages -- whose content depends on the specific context of hallucination -- shape user reasoning and actions in an educational quiz setting. Preliminary findings suggest that while warnings improve accuracy and awareness of high-level hallucinations, they may also introduce cognitive friction, leading to confusion and diminished trust in the system. By examining these interactions, this work contributes to the broader goal of AI-augmented reasoning: developing systems that actively support human reflection, critical thinking, and informed decision-making rather than passive information consumption.

Paper Structure

This paper contains 5 sections, 3 figures.

Figures (3)

  • Figure 1: The accuracy rate of the question & answer task under different levels of hallucination outputs, and under different warning conditions.
  • Figure 2: The average trust (in scale of 1-5; 1 is the lowest, 5 is the highest) to the system reported by participants of the pilot study.
  • Figure 3: The average ease (in scale of 1-5; 1 is the lowest, 5 is the highest) to the system reported by participants of the pilot study.