Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations

Sukmin Cho; Soyeong Jeong; Jeongyeon Seo; Taeho Hwang; Jong C. Park

Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations

Sukmin Cho, Soyeong Jeong, Jeongyeon Seo, Taeho Hwang, Jong C. Park

TL;DR

This paper investigates the robustness of Retrieval-Augmented Generation (RAG) systems to real-world, low-level textual errors (typos) by introducing GARAG, a black-box genetic attack that optimizes dual objectives for retrieval and grounding. GARAG uses a NSGA-II-based search over perturbed documents to locate adversarial inputs that degrade both retrieval relevance and the generation probability of correct answers, and it reports attack success rates around 70% across multiple QA datasets and model configurations. The results show that minor perturbations can substantially degrade retrieval performance and grounding fidelity, causing notable drops in end-to-end QA accuracy, with the retriever often acting as a partial shield but not a full safeguard. The work highlights the need for holistic defenses against typographical noise in real-world data and provides insight into the types of perturbations most detrimental to RAG systems, guiding future robustness research.

Abstract

The robustness of recent Large Language Models (LLMs) has become increasingly crucial as their applicability expands across various domains and real-world applications. Retrieval-Augmented Generation (RAG) is a promising solution for addressing the limitations of LLMs, yet existing studies on the robustness of RAG often overlook the interconnected relationships between RAG components or the potential threats prevalent in real-world databases, such as minor textual errors. In this work, we investigate two underexplored aspects when assessing the robustness of RAG: 1) vulnerability to noisy documents through low-level perturbations and 2) a holistic evaluation of RAG robustness. Furthermore, we introduce a novel attack method, the Genetic Attack on RAG (\textit{GARAG}), which targets these aspects. Specifically, GARAG is designed to reveal vulnerabilities within each component and test the overall system functionality against noisy documents. We validate RAG robustness by applying our \textit{GARAG} to standard QA datasets, incorporating diverse retrievers and LLMs. The experimental results show that GARAG consistently achieves high attack success rates. Also, it significantly devastates the performance of each component and their synergy, highlighting the substantial risk that minor textual inaccuracies pose in disrupting RAG systems in the real world.

Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations

TL;DR

Abstract

Paper Structure (48 sections, 2 equations, 6 figures, 10 tables, 2 algorithms)

This paper contains 48 sections, 2 equations, 6 figures, 10 tables, 2 algorithms.

Introduction
Related Work
Robustness in RAG
Adversarial Attacks in NLP
Method
Problem Formulation
Pipeline of RAG.
Adversarial Document Generation.
Attack Objective on RAG.
GARAG: Genetic Attack on RAG
Initialization.
Crossover & Mutation.
Selection.
Experimental Setup
Model
...and 33 more sections

Figures (6)

Figure 1: Impact of noisy documents in real-world databases on the RAG system: The retriever selects a noisy document, causing the reader to produce incorrect answers.
Figure 2: (Left) The search space formulated by our proposed attack objectives, $\mathcal{L}_{\textnormal{RSR}}$ and $\mathcal{L}_{\textnormal{GPR}}$. (Right) An overview of the iterative process implemented by our proposed method, GARAG.
Figure 3: Adversarial attack analysis on the NQ dataset using Contriever and Llama2-7b: (Left) Variations in ASR and EM scores as the $pr_{\textnormal{pert}}$ increases from 0 to 0.9, with ASR shown in blue and EM in red. (Center) Variations in ASR and EM scores across increasing iterations ($N_{\textnormal{iter}}$), also indicated in blue and red respectively. (Right) Distribution of correctness among predictions depending on $\mathcal{L}_{\textnormal{GPR}}$.
Figure 4: Confusion matrices of prediction from $\bm{d^*}$ across EM and Acc. on NQ with Contriever.
Figure 5: Distribution of grammatically correct documents among $\bm{d^*}$ on NQ with the Contriever and Llama2-7b.
...and 1 more figures

Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations

TL;DR

Abstract

Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)