Defending Against Disinformation Attacks in Open-Domain Question Answering
Orion Weller, Aleem Khan, Nathaniel Weir, Dawn Lawrie, Benjamin Van Durme
TL;DR
Open-domain QA systems are vulnerable to disinformation attacks that poison retrieval sources. The paper introduces a defense combining query augmentation, which surfaces diverse, correct contexts, with Confidence from Answer Redundancy (CAR) to gauge when augmented evidence should influence predictions. A redundancy-based resolution strategy outperforms baselines, delivering up to about 20 EM gains across Natural Questions and TriviaQA for FiD and Atlas, even when using open-source LLMs for augmentation. The approach is gradient-free, broadly applicable to existing pipelines, and demonstrates practical defense potential against real-world disinformation affecting knowledge-intensive NLP tasks.
Abstract
Recent work in open-domain question answering (ODQA) has shown that adversarial poisoning of the search collection can cause large drops in accuracy for production systems. However, little to no work has proposed methods to defend against these attacks. To do so, we rely on the intuition that redundant information often exists in large corpora. To find it, we introduce a method that uses query augmentation to search for a diverse set of passages that could answer the original question but are less likely to have been poisoned. We integrate these new passages into the model through the design of a novel confidence method, comparing the predicted answer to its appearance in the retrieved contexts (what we call Confidence from Answer Redundancy, i.e. CAR). Together these methods allow for a simple but effective way to defend against poisoning attacks that provides gains of nearly 20% exact match across varying levels of data poisoning/knowledge conflicts.
