Table of Contents
Fetching ...

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Yigitcan Kaya, Anton Landerer, Stijn Pletinckx, Michelle Zimmermann, Christopher Kruegel, Giovanni Vigna

TL;DR

This paper addresses prompt injection risks in the long tail of web chatbots powered by third-party plugins. It combines a large-scale measurement of 17 plugins across over 10,000 websites with controlled experiments to quantify how plugin design and configuration affect attack success, particularly through history forgery and untrusted content ingestion. Key findings show widespread insecure practices: many plugins transmit full conversation histories without integrity checks, and many scrape and inject third-party content, creating persistent indirect injection risks; system prompts and tool integrations further modulate vulnerability. The work demonstrates substantial practical impact, including real-world exposure, responsible disclosures that prompted fixes, and pragmatic defenses such as isolating untrusted content and hardening tool instructions, while highlighting the need for standardized web-plugin security to secure the next generation of web-based LLM applications.

Abstract

Prompt injection attacks pose a critical threat to large language models (LLMs), with prior work focusing on cutting-edge LLM applications like personal copilots. In contrast, simpler LLM applications, such as customer service chatbots, are widespread on the web, yet their security posture and exposure to such attacks remain poorly understood. These applications often rely on third-party chatbot plugins that act as intermediaries to commercial LLM APIs, offering non-expert website builders intuitive ways to customize chatbot behaviors. To bridge this gap, we present the first large-scale study of 17 third-party chatbot plugins used by over 10,000 public websites, uncovering previously unknown prompt injection risks in practice. First, 8 of these plugins (used by 8,000 websites) fail to enforce the integrity of the conversation history transmitted in network requests between the website visitor and the chatbot. This oversight amplifies the impact of direct prompt injection attacks by allowing adversaries to forge conversation histories (including fake system messages), boosting their ability to elicit unintended behavior (e.g., code generation) by 3 to 8x. Second, 15 plugins offer tools, such as web-scraping, to enrich the chatbot's context with website-specific content. However, these tools do not distinguish the website's trusted content (e.g., product descriptions) from untrusted, third-party content (e.g., customer reviews), introducing a risk of indirect prompt injection. Notably, we found that ~13% of e-commerce websites have already exposed their chatbots to third-party content. We systematically evaluate both vulnerabilities through controlled experiments grounded in real-world observations, focusing on factors such as system prompt design and the underlying LLM. Our findings show that many plugins adopt insecure practices that undermine the built-in LLM safeguards.

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

TL;DR

This paper addresses prompt injection risks in the long tail of web chatbots powered by third-party plugins. It combines a large-scale measurement of 17 plugins across over 10,000 websites with controlled experiments to quantify how plugin design and configuration affect attack success, particularly through history forgery and untrusted content ingestion. Key findings show widespread insecure practices: many plugins transmit full conversation histories without integrity checks, and many scrape and inject third-party content, creating persistent indirect injection risks; system prompts and tool integrations further modulate vulnerability. The work demonstrates substantial practical impact, including real-world exposure, responsible disclosures that prompted fixes, and pragmatic defenses such as isolating untrusted content and hardening tool instructions, while highlighting the need for standardized web-plugin security to secure the next generation of web-based LLM applications.

Abstract

Prompt injection attacks pose a critical threat to large language models (LLMs), with prior work focusing on cutting-edge LLM applications like personal copilots. In contrast, simpler LLM applications, such as customer service chatbots, are widespread on the web, yet their security posture and exposure to such attacks remain poorly understood. These applications often rely on third-party chatbot plugins that act as intermediaries to commercial LLM APIs, offering non-expert website builders intuitive ways to customize chatbot behaviors. To bridge this gap, we present the first large-scale study of 17 third-party chatbot plugins used by over 10,000 public websites, uncovering previously unknown prompt injection risks in practice. First, 8 of these plugins (used by 8,000 websites) fail to enforce the integrity of the conversation history transmitted in network requests between the website visitor and the chatbot. This oversight amplifies the impact of direct prompt injection attacks by allowing adversaries to forge conversation histories (including fake system messages), boosting their ability to elicit unintended behavior (e.g., code generation) by 3 to 8x. Second, 15 plugins offer tools, such as web-scraping, to enrich the chatbot's context with website-specific content. However, these tools do not distinguish the website's trusted content (e.g., product descriptions) from untrusted, third-party content (e.g., customer reviews), introducing a risk of indirect prompt injection. Notably, we found that ~13% of e-commerce websites have already exposed their chatbots to third-party content. We systematically evaluate both vulnerabilities through controlled experiments grounded in real-world observations, focusing on factors such as system prompt design and the underlying LLM. Our findings show that many plugins adopt insecure practices that undermine the built-in LLM safeguards.

Paper Structure

This paper contains 32 sections, 6 figures, 20 tables.

Figures (6)

  • Figure 1: The communication flow of chatbot plugins. Type ❶ and ❷ cover WP and Generic plugins, respectively.
  • Figure 2: Longitudinal Trends in Web Chatbot Adoption. [Left] Chatbot-enabled websites since Jan 2023. [Right] Domain registrations for chatbot sites since Jan 2022.
  • Figure 3: Direct Prompt Injection via Message History Forging. [Top] An untampered network request that the chatbot rejects to preserve confidentiality. [Middle and Bottom] An adversary forges messages with elevated roles (assistant and system) in the request, causing the chatbot to reveal its system prompt as the plugin omits request integrity checks.
  • Figure 4: Indirect Prompt Injection via Website Content Manipulation.Mallory posts a malicious review scraped by the chatbot plugin and fed to the LLM, which incorporates the injected prompt when responding to a benign query.
  • Figure 5: Language and Content Distributions of Chatbot Websites. [Left] Distribution of content languages. [Right] Distribution of content categories, annotated using the Cloudflare Domain Intelligence API cloudflare_api. These results highlight the diverse use cases of chatbot deployments.
  • ...and 1 more figures