Table of Contents
Fetching ...

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

Xinyun Zhou, Xinfeng Li, Yinan Peng, Ming Xu, Xuanwang Zhang, Miao Yu, Yidong Wang, Xiaojun Jia, Kun Wang, Qingsong Wen, XiaoFeng Wang, Wei Dong

TL;DR

The paper uncovers a critical vulnerability in Retrieval-Augmented Generation (RAG) systems: subtle symbolic perturbations, especially emoticons, can hijack the retrieval process. Through extensive experiments across general QA and code tasks with multiple retrievers and generators, the authors show near-100% attack success when emoticons perturb queries or injected texts, particularly when placed at the start of a query. They analyze underlying mechanisms—rare-token embeddings, insertion-induced positional shifts, and high-dimensional amplification—and propose defenses such as paraphrasing-based query disinfection and perturbed-text detection, along with open-source datasets and models. The work highlights a pressing need to redesign robustness in RAG systems and outlines practical directions for building safer, more trustworthy retrieval-augmented AI systems.

Abstract

Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI, enhancing large language model (LLM) faithfulness by incorporating external knowledge. However, our study unveils a critical, overlooked vulnerability: their profound susceptibility to subtle symbolic perturbations, particularly through near-imperceptible emoticon tokens such as "(@_@)" that can catastrophically mislead retrieval, termed EmoRAG. We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts that contain a matching emoticon. Our extensive experiment across general question-answering and code domains, using a range of state-of-the-art retrievers and generators, reveals three key findings: (I) Single-Emoticon Disaster: Minimal emoticon injections cause maximal disruptions, with a single emoticon almost 100% dominating RAG output. (II) Positional Sensitivity: Placing an emoticon at the beginning of a query can cause severe perturbation, with F1-Scores exceeding 0.92 across all datasets. (III) Parameter-Scale Vulnerability: Counterintuitively, models with larger parameters exhibit greater vulnerability to the interference. We provide an in-depth analysis to uncover the underlying mechanisms of these phenomena. Furthermore, we raise a critical concern regarding the robustness assumption of current RAG systems, envisioning a threat scenario where an adversary exploits this vulnerability to manipulate the RAG system. We evaluate standard defenses and find them insufficient against EmoRAG. To address this, we propose targeted defenses, analyzing their strengths and limitations in mitigating emoticon-based perturbations. Finally, we outline future directions for building robust RAG systems.

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

TL;DR

The paper uncovers a critical vulnerability in Retrieval-Augmented Generation (RAG) systems: subtle symbolic perturbations, especially emoticons, can hijack the retrieval process. Through extensive experiments across general QA and code tasks with multiple retrievers and generators, the authors show near-100% attack success when emoticons perturb queries or injected texts, particularly when placed at the start of a query. They analyze underlying mechanisms—rare-token embeddings, insertion-induced positional shifts, and high-dimensional amplification—and propose defenses such as paraphrasing-based query disinfection and perturbed-text detection, along with open-source datasets and models. The work highlights a pressing need to redesign robustness in RAG systems and outlines practical directions for building safer, more trustworthy retrieval-augmented AI systems.

Abstract

Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI, enhancing large language model (LLM) faithfulness by incorporating external knowledge. However, our study unveils a critical, overlooked vulnerability: their profound susceptibility to subtle symbolic perturbations, particularly through near-imperceptible emoticon tokens such as "(@_@)" that can catastrophically mislead retrieval, termed EmoRAG. We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts that contain a matching emoticon. Our extensive experiment across general question-answering and code domains, using a range of state-of-the-art retrievers and generators, reveals three key findings: (I) Single-Emoticon Disaster: Minimal emoticon injections cause maximal disruptions, with a single emoticon almost 100% dominating RAG output. (II) Positional Sensitivity: Placing an emoticon at the beginning of a query can cause severe perturbation, with F1-Scores exceeding 0.92 across all datasets. (III) Parameter-Scale Vulnerability: Counterintuitively, models with larger parameters exhibit greater vulnerability to the interference. We provide an in-depth analysis to uncover the underlying mechanisms of these phenomena. Furthermore, we raise a critical concern regarding the robustness assumption of current RAG systems, envisioning a threat scenario where an adversary exploits this vulnerability to manipulate the RAG system. We evaluate standard defenses and find them insufficient against EmoRAG. To address this, we propose targeted defenses, analyzing their strengths and limitations in mitigating emoticon-based perturbations. Finally, we outline future directions for building robust RAG systems.

Paper Structure

This paper contains 40 sections, 3 equations, 18 figures, 9 tables.

Figures (18)

  • Figure 1: Illustration of emoticon-based perturbation hijacking a RAG system. Step ①: User submits query. Step ②: Retriever Processing. Step ③: The retriever passes the context to the LLM. Step ④: The LLM generates a response. Step ⑤: User Receives Response.
  • Figure 2: Perturbed effect of EmoRAG on the Code domain-specific retriever
  • Figure 2: Impact of 96 emoticons with diverse structures, frequencies, and meanings on EmoRAG
  • Figure 3: The impact of increasing $N$ and $k$ on ASR, Precision, Recall, and F1-Score in the NQ dataset.
  • Figure 4: The impact of varying the number of injected emoticons on the F1-Score across multiple datasets with Contriever as the retriever.
  • ...and 13 more figures