Defending Against Disinformation Attacks in Open-Domain Question Answering

Orion Weller; Aleem Khan; Nathaniel Weir; Dawn Lawrie; Benjamin Van Durme

Defending Against Disinformation Attacks in Open-Domain Question Answering

Orion Weller, Aleem Khan, Nathaniel Weir, Dawn Lawrie, Benjamin Van Durme

TL;DR

Open-domain QA systems are vulnerable to disinformation attacks that poison retrieval sources. The paper introduces a defense combining query augmentation, which surfaces diverse, correct contexts, with Confidence from Answer Redundancy (CAR) to gauge when augmented evidence should influence predictions. A redundancy-based resolution strategy outperforms baselines, delivering up to about 20 EM gains across Natural Questions and TriviaQA for FiD and Atlas, even when using open-source LLMs for augmentation. The approach is gradient-free, broadly applicable to existing pipelines, and demonstrates practical defense potential against real-world disinformation affecting knowledge-intensive NLP tasks.

Abstract

Recent work in open-domain question answering (ODQA) has shown that adversarial poisoning of the search collection can cause large drops in accuracy for production systems. However, little to no work has proposed methods to defend against these attacks. To do so, we rely on the intuition that redundant information often exists in large corpora. To find it, we introduce a method that uses query augmentation to search for a diverse set of passages that could answer the original question but are less likely to have been poisoned. We integrate these new passages into the model through the design of a novel confidence method, comparing the predicted answer to its appearance in the retrieved contexts (what we call Confidence from Answer Redundancy, i.e. CAR). Together these methods allow for a simple but effective way to defend against poisoning attacks that provides gains of nearly 20% exact match across varying levels of data poisoning/knowledge conflicts.

Defending Against Disinformation Attacks in Open-Domain Question Answering

TL;DR

Abstract

Paper Structure (40 sections, 9 figures, 11 tables)

This paper contains 40 sections, 9 figures, 11 tables.

Introduction
Experimental Details
Data
Models
Proposed Method
Query Augmentation
Confidence from Answer Redundancy
Answer Resolution
Results
Can we use open-source LLMs as the query augmentation model?
How many augmented questions are needed for our approach to work well?
Why is performance not 0% at 100 poisoned documents?
Conclusion
Limitations
Realism of Proposed Setting
...and 25 more sections

Figures (9)

Figure 1: An example of a poisoning attack on an open-domain question answering (ODQA) pipeline with our method (Lower) vs a standard system (Upper). The passages have been adversarially poisoned to change Obama's correct birthplace to be incorrect. Our proposed defense method uses query augmentation to find new contexts that are less likely to be poisoned (#4 and #5). It then uses a novel confidence-based aggregation method (CAR) to predict the correct answer.
Figure 2: Number of new passages retrieved per augmented question (e.g., a question in the 100 bin would have 100 new contexts not retrieved by the original). Natural Questions is on top and TriviaQA on bottom.
Figure 3: Data poisoning and defense strategies using Atlas (Lower Figure) and FiD (Upper Figure). See Appendix \ref{['app:tables']} for equivalent table version of these plots. Left shows TriviaQA, right shows Natural Questions. C stands for context. 100 poisoned articles indicates all contexts are poisoned; performance is non-zero because the models ignore the contexts or the poisoning failed to recognize all aliases (§\ref{['app:num_poisoned']}). Note that Redundancy greatly outperforms the majority vote baseline from pan2023risk. Scores plateau after around 40 poisoned articles as that is around when all 100 retrieved passages are poisoned (see Appendix \ref{['app:num_poisoned']} for a discussion of article vs passage).
Figure 4: An ablation on the number of augmented queries (and thus number of times retrieval is used) for the redundancy resolution method on Natural Questions 1-article FiD poisoning setting. As the number of augmented queries increases, so does the performance. Baseline performance is 50.1%, indicating that even just one augmented query provides significant gains.
Figure 5: An ablation on Confidence from Answer Redundancy (CAR) compared to their exact match scores on the NQ 1-article poisoned setting. Those in the True bar have greater than 5 unique passages that contain the predicted answer string.
...and 4 more figures

Defending Against Disinformation Attacks in Open-Domain Question Answering

TL;DR

Abstract

Defending Against Disinformation Attacks in Open-Domain Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (9)