Table of Contents
Fetching ...

Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies

Aswin RRV, Nemika Tyagi, Md Nayem Uddin, Neeraj Varshney, Chitta Baral

TL;DR

This work analyzes sycophantic hallucination in LLMs, where models align answers to user expectations when exposed to misleading keywords. It introduces a pipeline for generating misleading keyword sets, evaluates five LLMs under generic and domain-specific prompts, and tests four mitigation strategies (in-context exemplars, precautionary instructions, internal knowledge augmentation, external knowledge augmentation). Factual accuracy is assessed with Google Gemini and human annotations across five domains, revealing substantial vulnerability to cue-based misinformation and mixed mitigation effectiveness. The findings highlight the need for robust, domain-aware defenses and provide directions for improving reliability and trustworthiness of LLMs in information-critical tasks.

Abstract

This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines, users may recall fragments of misleading keywords and submit them to an LLM, hoping for a comprehensive response. Our empirical analysis of several LLMs shows the potential danger of these models amplifying misinformation when presented with misleading keywords. Additionally, we thoroughly assess four existing hallucination mitigation strategies to reduce LLMs sycophantic behavior. Our experiments demonstrate the effectiveness of these strategies for generating factually correct statements. Furthermore, our analyses delve into knowledge-probing experiments on factual keywords and different categories of sycophancy mitigation.

Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies

TL;DR

This work analyzes sycophantic hallucination in LLMs, where models align answers to user expectations when exposed to misleading keywords. It introduces a pipeline for generating misleading keyword sets, evaluates five LLMs under generic and domain-specific prompts, and tests four mitigation strategies (in-context exemplars, precautionary instructions, internal knowledge augmentation, external knowledge augmentation). Factual accuracy is assessed with Google Gemini and human annotations across five domains, revealing substantial vulnerability to cue-based misinformation and mixed mitigation effectiveness. The findings highlight the need for robust, domain-aware defenses and provide directions for improving reliability and trustworthiness of LLMs in information-critical tasks.

Abstract

This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines, users may recall fragments of misleading keywords and submit them to an LLM, hoping for a comprehensive response. Our empirical analysis of several LLMs shows the potential danger of these models amplifying misinformation when presented with misleading keywords. Additionally, we thoroughly assess four existing hallucination mitigation strategies to reduce LLMs sycophantic behavior. Our experiments demonstrate the effectiveness of these strategies for generating factually correct statements. Furthermore, our analyses delve into knowledge-probing experiments on factual keywords and different categories of sycophancy mitigation.
Paper Structure (35 sections, 12 figures, 13 tables)

This paper contains 35 sections, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Prompting five different LLMs to generate a factual statement with three misleading keywords: "Lionel Messi, 2014 FIFA World Cup, Golden Boot". All five LLMs show sycophancy by generating factually incorrect statements. Note that a possible factually correct response to this prompt is "Lionel Messi did not win Golden Boot award in 2014 FIFA World Cup."
  • Figure 2: Model specific percentage distribution of four mitigation categories. We manually evaluated a uniform sample of 50 factual statements using the most effective mitigation strategy identified for each model. These are the samples where the factual accuracy changed from incorrect to correct after applying the mitigation.
  • Figure 3: LLMs performance on answering knowledge-probing questions. All models answer correctly for at least 65% of the knowledge probing questions.
  • Figure 4: An example of generating a factual statement with non-misleading keywords. In this case, the Llama-13b model generated a factually inaccurate statement despite the keywords being correct.
  • Figure 5: The prompt used for querying Google Gemini. We use this prompt to fact-check the statement generated by the models.
  • ...and 7 more figures