Table of Contents
Fetching ...

Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

Feiteng Fang, Yuelin Bai, Shiwen Ni, Min Yang, Xiaojun Chen, Ruifeng Xu

TL;DR

This work addresses the vulnerability of retrieval-augmented language models to noisy retrieved passages by introducing a three-type retrieval-noise taxonomy (relevant, irrelevant, counterfactual) and a novel adaptive adversarial training framework (RAAT). RAAT dynamically selects and augments adversarial noise during training, and adds a noise-awareness auxiliary task to help the model internally recognize noisy contexts. A dedicated benchmark, RAG-Bench, constructed from Natural Questions, TriviaQA, and WebQ, evaluates robustness across noise types, demonstrating that RAAT yields consistent improvements in F1 and EM over strong baselines on LLaMA2-7B. The approach offers practical gains for real-world RAG systems and provides a foundation for further joint optimization with retrievers and broader domain coverage.

Abstract

Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT.

Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

TL;DR

This work addresses the vulnerability of retrieval-augmented language models to noisy retrieved passages by introducing a three-type retrieval-noise taxonomy (relevant, irrelevant, counterfactual) and a novel adaptive adversarial training framework (RAAT). RAAT dynamically selects and augments adversarial noise during training, and adds a noise-awareness auxiliary task to help the model internally recognize noisy contexts. A dedicated benchmark, RAG-Bench, constructed from Natural Questions, TriviaQA, and WebQ, evaluates robustness across noise types, demonstrating that RAAT yields consistent improvements in F1 and EM over strong baselines on LLaMA2-7B. The approach offers practical gains for real-world RAG systems and provides a foundation for further joint optimization with retrievers and broader domain coverage.

Abstract

Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT.
Paper Structure (24 sections, 5 equations, 5 figures, 3 tables)

This paper contains 24 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An illustrative example of the RAG process applied to question answering. The model predicts the correct answer with accurate retrieved text. However, it fails to produce the right answer when the retrieved text contains misleading or inaccurate information.
  • Figure 2: Exact match (EM) scores of various models under different types of retrieval noises. "Golden Context" denotes instances where LLMs respond to questions with reference to the golden retrieval context. "No Noise" indicates instances where LLMs answer questions without any retrieval. The experimental configurations of other models involve the introduction of different types of noises on the foundation of the "Golden Context".
  • Figure 3: The overview of our proposed RAAT method, which incorporates three distinct types of retrieval noises and the golden retrieval context during the training process.
  • Figure 4: The number of queries and parameter updates are 4,500 and 9,000, respectively. The statistical content in this table pertains to different types of retrieval noises selected by RAAT each time the model parameters undergo an update.
  • Figure 5: The results of T-SNE visualization. Following the introduction of four types of adversarial samples (i.e., retrieval noises) into models tuned by various methods, the hidden state of the last token is extracted. Subsequently, dimensionality reduction using t-SNE, clustering, and visualization are performed. This visual representation includes three methods, namely $\text{RALM}_{golden}$, $\text{RetRobust}$, and $\text{RAAT}$.