Table of Contents
Fetching ...

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Dennis Rall, Bernhard Bauer, Mohit Mittal, Thomas Fraunholz

TL;DR

This work investigates the vulnerability of LLMs integrated with Retrieval-Augmented Generation and web-search tools to indirect prompt injection attacks. It presents an end-to-end exploit framework using a RAG agent, hidden prompt injections, and a diverse attack taxonomy (including fuzzing, encoding, and Unicode obfuscation) to exfiltrate data, evaluated across many models with a standardized, open framework. The findings show that several vendors' models remain susceptible, with resilience not tightly tied to model size, and that well-known attack templates continue to be effective. The study advocates defense-in-depth, proactive security training, and a centralized repository of attack vectors to drive secure, design-first LLM development and deployment in enterprise environments.

Abstract

Large language models (LLMs) are now routinely used to autonomously execute complex tasks, from natural language processing to dynamic workflows like web searches. The usage of tool-calling and Retrieval Augmented Generation (RAG) allows LLMs to process and retrieve sensitive corporate data, amplifying both their functionality and vulnerability to abuse. As LLMs increasingly interact with external data sources, indirect prompt injection emerges as a critical and evolving attack vector, enabling adversaries to exploit models through manipulated inputs. Through a systematic evaluation of indirect prompt injection attacks across diverse models, we analyze how susceptible current LLMs are to such attacks, which parameters, including model size and manufacturer, specific implementations, shape their vulnerability, and which attack methods remain most effective. Our results reveal that even well-known attack patterns continue to succeed, exposing persistent weaknesses in model defenses. To address these vulnerabilities, we emphasize the need for strengthened training procedures to enhance inherent resilience, a centralized database of known attack vectors to enable proactive defense, and a unified testing framework to ensure continuous security validation. These steps are essential to push developers toward integrating security into the core design of LLMs, as our findings show that current models still fail to mitigate long-standing threats.

Exploiting Web Search Tools of AI Agents for Data Exfiltration

TL;DR

This work investigates the vulnerability of LLMs integrated with Retrieval-Augmented Generation and web-search tools to indirect prompt injection attacks. It presents an end-to-end exploit framework using a RAG agent, hidden prompt injections, and a diverse attack taxonomy (including fuzzing, encoding, and Unicode obfuscation) to exfiltrate data, evaluated across many models with a standardized, open framework. The findings show that several vendors' models remain susceptible, with resilience not tightly tied to model size, and that well-known attack templates continue to be effective. The study advocates defense-in-depth, proactive security training, and a centralized repository of attack vectors to drive secure, design-first LLM development and deployment in enterprise environments.

Abstract

Large language models (LLMs) are now routinely used to autonomously execute complex tasks, from natural language processing to dynamic workflows like web searches. The usage of tool-calling and Retrieval Augmented Generation (RAG) allows LLMs to process and retrieve sensitive corporate data, amplifying both their functionality and vulnerability to abuse. As LLMs increasingly interact with external data sources, indirect prompt injection emerges as a critical and evolving attack vector, enabling adversaries to exploit models through manipulated inputs. Through a systematic evaluation of indirect prompt injection attacks across diverse models, we analyze how susceptible current LLMs are to such attacks, which parameters, including model size and manufacturer, specific implementations, shape their vulnerability, and which attack methods remain most effective. Our results reveal that even well-known attack patterns continue to succeed, exposing persistent weaknesses in model defenses. To address these vulnerabilities, we emphasize the need for strengthened training procedures to enhance inherent resilience, a centralized database of known attack vectors to enable proactive defense, and a unified testing framework to ensure continuous security validation. These steps are essential to push developers toward integrating security into the core design of LLMs, as our findings show that current models still fail to mitigate long-standing threats.

Paper Structure

This paper contains 15 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Attack Scenario Illustrating Indirect Prompt Injection for Data Exfiltration. An AI agent, equipped with web search capabilities and access to a company’s internal knowledge base, is manipulated by a malicious website to exfiltrate sensitive information. The attacker embeds hidden instructions in the website, which the agent processes during a routine user-initiated web search. Following these instructions, the agent retrieves sensitive company data and transmits it to an attacker-controlled log server via a web request, using the same web search tool. This scenario demonstrates the vulnerability of AI-driven workflows to indirect prompt injection attacks.
  • Figure 2: Bar plot showing the attack success rates of the models. The blue bars represent the attack success rates of all runs, and the orange bars focus only on the base runs of the templates without variations.
  • Figure 3: Attack success rate of the different variations. The IdentityPromptConverter is the identity function and marks the base template.
  • Figure 4: Attack success rate of the twenty most effective templates.
  • Figure 5: Scatter plot showing the attack success rate of the models compared to the number of parameters, where known.