Table of Contents
Fetching ...

Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

Giwon Hong, Jeonghwan Kim, Junmo Kang, Sung-Hyon Myaeng, Joyce Jiyoung Whang

TL;DR

This work examines how retrieval-augmented QA systems falter when retrieved documents contain conflicting information. It introduces a discriminative training objective and prompting strategies to discern and downweight counterfactual content, and demonstrates robustness gains on open-domain QA benchmarks. To evaluate realism and breadth of noise, the authors release MacNoise, a large-scale, LLM-generated perturbation benchmark, and show complementarities between fine-tuned discriminators and in-context prompting. The results indicate that combining discriminative signals with prompt-based reasoning yields improved stability and accuracy under knowledge conflicts, with implications for safer and more reliable retrieval-augmented systems.

Abstract

Most existing retrieval-augmented language models (LMs) assume a naive dichotomy within a retrieved document set: query-relevance and irrelevance. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information, causing conflict among the retrieved documents and thereby negatively influencing model decisions as noise. We observe that existing LMs are highly brittle to the presence of conflicting information in both the fine-tuning and in-context few-shot learning scenarios. We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability. Our empirical results on open-domain QA show that these approaches significantly enhance model robustness. We also provide our findings on incorporating the fine-tuned discriminator's decision into the in-context learning process, proposing a way to exploit the benefits of two disparate learning schemes. Alongside our findings, we provide MacNoise, a machine-generated, conflict-induced dataset to further encourage research in this direction.

Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

TL;DR

This work examines how retrieval-augmented QA systems falter when retrieved documents contain conflicting information. It introduces a discriminative training objective and prompting strategies to discern and downweight counterfactual content, and demonstrates robustness gains on open-domain QA benchmarks. To evaluate realism and breadth of noise, the authors release MacNoise, a large-scale, LLM-generated perturbation benchmark, and show complementarities between fine-tuned discriminators and in-context prompting. The results indicate that combining discriminative signals with prompt-based reasoning yields improved stability and accuracy under knowledge conflicts, with implications for safer and more reliable retrieval-augmented systems.

Abstract

Most existing retrieval-augmented language models (LMs) assume a naive dichotomy within a retrieved document set: query-relevance and irrelevance. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information, causing conflict among the retrieved documents and thereby negatively influencing model decisions as noise. We observe that existing LMs are highly brittle to the presence of conflicting information in both the fine-tuning and in-context few-shot learning scenarios. We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability. Our empirical results on open-domain QA show that these approaches significantly enhance model robustness. We also provide our findings on incorporating the fine-tuned discriminator's decision into the in-context learning process, proposing a way to exploit the benefits of two disparate learning schemes. Alongside our findings, we provide MacNoise, a machine-generated, conflict-induced dataset to further encourage research in this direction.
Paper Structure (56 sections, 1 equation, 9 figures, 18 tables)

This paper contains 56 sections, 1 equation, 9 figures, 18 tables.

Figures (9)

  • Figure 1: In an ODQA setting, (a) a question is used to retrieve a set of (b) relevant documents which may contain conflict-causing documents that render (c) the retrieval-augmented LMs unreliable.
  • Figure 2: Illustration of our approaches to enhancing robustness to counterfactual noise. (a) Along with the decoder, the discriminator is jointly trained with the downstream task (QA), making the encoder produce corrupt-aware embeddings. (b) GPT-3.5 is prompted to find the perturbed documents before generating an answer. A zero-shot example is shown for brevity. (c) Fine-tuned discriminator output is injected into the prompt for GPT-3.5.
  • Figure 3: Comparison of GPT-3.5's stability for each discriminator setting. The shaded area represents the variance computed between the best and worst EM.
  • Figure 4: Results on TQA-open dev. FiD (i.e., discriminator) is trained on NQ-open and evaluated on TQA-open to examine the transferability of the robustness acquired through our method.
  • Figure 5: EM scores of the Semi-Parametric and our Semi-Parametric w/ $\text{Disc}^{\text{FiD}}$ on the NQ-open dev w/ different perturbations: Entity Replacement or MacNoise. The discriminator is fine-tuned independently (either w/ Entity Replacement or MacNoise) or jointly (Joint Training) on the NQ-open train.
  • ...and 4 more figures