ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator

Junda Zhu; Lingyong Yan; Haibo Shi; Dawei Yin; Lei Sha

ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator

Junda Zhu, Lingyong Yan, Haibo Shi, Dawei Yin, Lei Sha

TL;DR

The paper addresses hallucinations in retrieval-augmented QA caused by noisy or fabricated retrieved content. It introduces ATM, a two-agent adversarial framework where an Attacker fabricates and permutes retrieved documents while a Generator learns to produce golden answers despite the noise, using multi-agent iterative tuning (MITO) that combines SFT, KL regularization, and DPO-guided adversarial updates. Empirical results across four knowledge-intensive QA datasets show that ATM achieves consistent gains over state-of-the-art robustness baselines, with convergence within a few iterations. The work demonstrates a practical path to robust RAG-QA systems in noisy information environments and suggests future work on joint retriever-generator optimization.

Abstract

Large language models (LLMs) are proven to benefit a lot from retrieval-augmented generation (RAG) in alleviating hallucinations confronted with knowledge-intensive questions. RAG adopts information retrieval techniques to inject external knowledge from semantic-relevant documents as input contexts. However, since today's Internet is flooded with numerous noisy and fabricating content, it is inevitable that RAG systems are vulnerable to these noises and prone to respond incorrectly. To this end, we propose to optimize the retrieval-augmented Generator with an Adversarial Tuning Multi-agent system (ATM). The ATM steers the Generator to have a robust perspective of useful documents for question answering with the help of an auxiliary Attacker agent through adversarially tuning the agents for several iterations. After rounds of multi-agent iterative tuning, the Generator can eventually better discriminate useful documents amongst fabrications. The experimental results verify the effectiveness of ATM and we also observe that the Generator can achieve better performance compared to the state-of-the-art baselines.

ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator

TL;DR

Abstract

Paper Structure (43 sections, 12 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 43 sections, 12 equations, 7 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Retrieval-Augmented Language Models
Adversarial Learning and Robust RAG
ATM System
Attacker
Fabrication Generation
List Permutation
Generator
Multi-agent Iterative Tuning
Initial Tuning
Iteratively Adversarial Optimization
Experiments
Experimental Setup
Datasets
...and 28 more sections

Figures (7)

Figure 1: GPT-4 refuses to answer long-tail questions due to knowledge deficiency, but can generate correct answers with retrieved knowledge (RAG-QA). However, when fabrications are provided, it directly refers to the document and generates a wrong answer. Our proposed ATM model can better utilize golden knowledge and resist the noise brought by fabrications.
Figure 2: Overview of the proposed ATM System.
Figure 3: Attacker's attacking types. Fabrications are LLM-generated content containing misleading fake knowledge. List Permutation shuffles the relative order of retrieved documents.
Figure 4: Subspan EM of different Generator given different fabrication numbers. The number of total documents (fabrications and retrieved documents together) remains $10$.
Figure 5: Frequency density diagram of Log Loss of Generator confronted with fabrications as the tuning iteration increases. Log Loss is positively correlated with $\mathrm{PPL}$. "Win" denotes the positive samples for Attacker DPO tuning which causes higher $\mathrm{PPL}$ while "Lose" denoting the negative samples.
...and 2 more figures

ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator

TL;DR

Abstract

ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator

Authors

TL;DR

Abstract

Table of Contents

Figures (7)