Table of Contents
Fetching ...

How Susceptible are LLMs to Influence in Prompts?

Sotiris Anagnostidis, Jannis Bulian

TL;DR

This study systematically assesses how augmented inputs from other models influence LLMs acting as judges across diverse QA tasks and open models. It formalizes an influence metric and demonstrates that explanations, authority cues, and confidence can robustly anchor model decisions, often irrespective of factual correctness. Mitigation via prompting is largely ineffective, highlighting the need for deeper reasoning and validation mechanisms. The work underscores the practical risks of external outputs shaping model judgments and argues for careful oversight and robust evaluation in real-world deployments. Overall, it provides a framework and empirical baseline for studying and mitigating prompt-driven susceptibility in LLMs.

Abstract

Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. As LLMs grow in capability, understanding their prompt-sensitivity becomes increasingly crucial for ensuring reliable and robust performance, particularly since evaluating these models becomes more challenging. In this work, we investigate how current models (Llama, Mixtral, Falcon) respond when presented with additional input from another model, mimicking a scenario where a more capable model -- or a system with access to more external information -- provides supplementary information to the target model. Across a diverse spectrum of question-answering tasks, we study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Specifically, we explore the influence of the presence of an explanation, the stated authoritativeness of the source, and the stated confidence of the supplementary input. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation. The models are more likely to be swayed if the input is presented as being authoritative or confident, but the effect is small in size. This study underscores the significant prompt-sensitivity of LLMs and highlights the potential risks of incorporating outputs from external sources without thorough scrutiny and further validation. As LLMs continue to advance, understanding and mitigating such sensitivities will be crucial for their reliable and trustworthy deployment.

How Susceptible are LLMs to Influence in Prompts?

TL;DR

This study systematically assesses how augmented inputs from other models influence LLMs acting as judges across diverse QA tasks and open models. It formalizes an influence metric and demonstrates that explanations, authority cues, and confidence can robustly anchor model decisions, often irrespective of factual correctness. Mitigation via prompting is largely ineffective, highlighting the need for deeper reasoning and validation mechanisms. The work underscores the practical risks of external outputs shaping model judgments and argues for careful oversight and robust evaluation in real-world deployments. Overall, it provides a framework and empirical baseline for studying and mitigating prompt-driven susceptibility in LLMs.

Abstract

Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. As LLMs grow in capability, understanding their prompt-sensitivity becomes increasingly crucial for ensuring reliable and robust performance, particularly since evaluating these models becomes more challenging. In this work, we investigate how current models (Llama, Mixtral, Falcon) respond when presented with additional input from another model, mimicking a scenario where a more capable model -- or a system with access to more external information -- provides supplementary information to the target model. Across a diverse spectrum of question-answering tasks, we study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Specifically, we explore the influence of the presence of an explanation, the stated authoritativeness of the source, and the stated confidence of the supplementary input. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation. The models are more likely to be swayed if the input is presented as being authoritative or confident, but the effect is small in size. This study underscores the significant prompt-sensitivity of LLMs and highlights the potential risks of incorporating outputs from external sources without thorough scrutiny and further validation. As LLMs continue to advance, understanding and mitigating such sensitivities will be crucial for their reliable and trustworthy deployment.
Paper Structure (39 sections, 4 equations, 25 figures, 10 tables)

This paper contains 39 sections, 4 equations, 25 figures, 10 tables.

Figures (25)

  • Figure 1: Leveraging external evidence, augmented inputs can help LLMs provide more informed answers.
  • Figure 2: Unbiased model performance.
  • Figure 3: Influence of advocates' responses to judges' predictions. Shading indicates whether an advocate provides their argument why their choice is the correct one, as seen in Tab.\ref{['tab:question_template']} (bottom right).
  • Figure 4: Reported influence on the judge, based on the correctness of the explanation provided by the advocates.
  • Figure 5: Change in the probability between the unbiased $\textit{LLM}_{\boldsymbol{J}}(y_i | {\bm{x}}_i)$ and the biased predictions $\textit{LLM}_{\boldsymbol{J}}(y_i | {\bm{x}}_i, {\bm{e}}_{ij})$.
  • ...and 20 more figures