Table of Contents
Fetching ...

Citations and Trust in LLM Generated Responses

Yifan Ding, Matthew Facciani, Amrit Poudel, Ellen Joyce, Salvador Aguinaga, Balaji Veeramani, Sanmitra Bhattacharya, Tim Weninger

TL;DR

This paper investigates how citations influence user trust in large language model (LLM) responses through an anti-monitoring framework and social proof. It implements a live QA experiment where ChatGPT-4 outputs are shown with zero, one, or five citations that are either valid or random, measuring self-reported trust and whether participants check citations. The findings show that citations generally boost trust, even when random, while actively checking citations reduces trust, with no clear advantage to using more than a single citation. The work also analyzes question type and demographics, revealing that political and fact-based questions tend to be rated more trustworthy, and that citation checking behavior varies across groups. These insights have practical implications for designing Retrieval Augmented Generation systems and user interfaces that balance transparency with perceived reliability.

Abstract

Question answering systems are rapidly advancing, but their opaque nature may impact user trust. We explored trust through an anti-monitoring framework, where trust is predicted to be correlated with presence of citations and inversely related to checking citations. We tested this hypothesis with a live question-answering experiment that presented text responses generated using a commercial Chatbot along with varying citations (zero, one, or five), both relevant and random, and recorded if participants checked the citations and their self-reported trust in the generated responses. We found a significant increase in trust when citations were present, a result that held true even when the citations were random; we also found a significant decrease in trust when participants checked the citations. These results highlight the importance of citations in enhancing trust in AI-generated content.

Citations and Trust in LLM Generated Responses

TL;DR

This paper investigates how citations influence user trust in large language model (LLM) responses through an anti-monitoring framework and social proof. It implements a live QA experiment where ChatGPT-4 outputs are shown with zero, one, or five citations that are either valid or random, measuring self-reported trust and whether participants check citations. The findings show that citations generally boost trust, even when random, while actively checking citations reduces trust, with no clear advantage to using more than a single citation. The work also analyzes question type and demographics, revealing that political and fact-based questions tend to be rated more trustworthy, and that citation checking behavior varies across groups. These insights have practical implications for designing Retrieval Augmented Generation systems and user interfaces that balance transparency with perceived reliability.

Abstract

Question answering systems are rapidly advancing, but their opaque nature may impact user trust. We explored trust through an anti-monitoring framework, where trust is predicted to be correlated with presence of citations and inversely related to checking citations. We tested this hypothesis with a live question-answering experiment that presented text responses generated using a commercial Chatbot along with varying citations (zero, one, or five), both relevant and random, and recorded if participants checked the citations and their self-reported trust in the generated responses. We found a significant increase in trust when citations were present, a result that held true even when the citations were random; we also found a significant decrease in trust when participants checked the citations. These results highlight the importance of citations in enhancing trust in AI-generated content.
Paper Structure (10 sections, 7 figures, 6 tables)

This paper contains 10 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: AI Chatbot system answering a user's question with five hyperlink citations. The presence of citations significantly increases the user's trust of the response.
  • Figure 2: Methodology of the Citation Trust Experiment. Participants are assigned to zero (purple), one (blue), or five (green) citations, which can be either valid (dotted-line) or random (solid-line). A participant may ask any question, and then rates the response on a scale of 1 to 10. This is repeated for ten total questions and a demographics survey is asked at the end.
  • Figure 3: Citations increase perceived trustworthiness, but random citations decrease perceived trustworthiness. Regression coefficients $\beta$ and their standard errors are plot on the x-axis.
  • Figure 4: Checking citations decrease perceived trust. Political and factual questions have a higher perceived trust. Regression coefficients $\beta$ and their standard errors are plot on the x-axis.
  • Figure 5: Visualization of question-topics asked by participants. Although we observe a substantial overlap in the kinds of questions asked by our participants (black) compared to AskReddit (red) and Quora (blue), we also identify several topical gaps. Some representative samples of these topical gaps are illustrated in on right. An interactive visualization of this figure is included in the supplementary material.
  • ...and 2 more figures