RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

Vasilije Stambolic; Aritra Dhar; Lukas Cavigelli

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli

TL;DR

RAG-Pull investigates a new class of imperceptible attacks on retrieval-augmented code-generation systems by inserting invisible Unicode characters into queries and/or code targets to bias embedding-based retrieval toward attacker-controlled snippets. It operates in a fully black-box setting, using differential evolution to maximize the embedding similarity $s(E(\\delta_q), E(\\delta_t))$ so that the adversarial target appears in the top-$K$ retrieved documents and is then produced by the LLM. The authors formalize three attack scenarios (perturbing the query, perturbing the target, perturbing both), evaluate across three datasets and two languages, and report retrieval up to near-100% and end-to-end vulnerability reproduction in many cases, highlighting a real risk to safety alignment in RAG systems. They also analyze defense strategies (e.g., sanitizing invisible characters or Unicode-aware tokenization) and emphasize the fragility of embedding-based retrieval to small, imperceptible perturbations, motivating Unicode-aware defenses and further study of RAG robustness in code-generation settings.

Abstract

Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding external data into the LLM's context. We develop a new class of black-box attack, RAG-Pull, that inserts hidden UTF characters into queries or external code repositories, redirecting retrieval toward malicious code, thereby breaking the models' safety alignment. We observe that query and code perturbations alone can shift retrieval toward attacker-controlled snippets, while combined query-and-target perturbations achieve near-perfect success. Once retrieved, these snippets introduce exploitable vulnerabilities such as remote code execution and SQL injection. RAG-Pull's minimal perturbations can alter the model's safety alignment and increase preference towards unsafe code, therefore opening up a new class of attacks on LLMs.

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

TL;DR

Abstract

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (26)