Table of Contents
Fetching ...

Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications

Stav Cohen, Ron Bitton, Ben Nassi

TL;DR

The paper analyzes ecosystem-scale security risks for GenAI-powered apps that rely on RAG-based inference, introducing Morris-II, a worm-like self-replicating prompt that propagates across an ecosystem to perform malicious actions and extract sensitive data. It formalizes the adversarial self-replicating prompts and presents an end-to-end evaluation using Enron emails, showing how context, embeddings, and hop count influence propagation. To counter this, the authors propose Virtual Donkey, a light-weight guardrail that detects adversarial prompts via input-output similarity and achieves near-perfect detection with low false positives, including robust performance on out-of-distribution data. The work highlights practical defense strategies and releases tooling to help defenders secure GenAI ecosystems against prompt-based ecosystem attacks.

Abstract

In this paper, we show that when the communication between GenAI-powered applications relies on RAG-based inference, an attacker can initiate a computer worm-like chain reaction that we call Morris-II. This is done by crafting an adversarial self-replicating prompt that triggers a cascade of indirect prompt injections within the ecosystem and forces each affected application to perform malicious actions and compromise the RAG of additional applications. We evaluate the performance of the worm in creating a chain of confidential user data extraction within a GenAI ecosystem of GenAI-powered email assistants and analyze how the performance of the worm is affected by the size of the context, the adversarial self-replicating prompt used, the type and size of the embedding algorithm employed, and the number of hops in the propagation. Finally, we introduce the Virtual Donkey, a guardrail intended to detect and prevent the propagation of Morris-II with minimal latency, high accuracy, and a low false-positive rate. We evaluate the guardrail's performance and show that it yields a perfect true-positive rate of 1.0 with a false-positive rate of 0.015, and is robust against out-of-distribution worms, consisting of unseen jailbreaking commands, a different email dataset, and various worm usecases.

Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications

TL;DR

The paper analyzes ecosystem-scale security risks for GenAI-powered apps that rely on RAG-based inference, introducing Morris-II, a worm-like self-replicating prompt that propagates across an ecosystem to perform malicious actions and extract sensitive data. It formalizes the adversarial self-replicating prompts and presents an end-to-end evaluation using Enron emails, showing how context, embeddings, and hop count influence propagation. To counter this, the authors propose Virtual Donkey, a light-weight guardrail that detects adversarial prompts via input-output similarity and achieves near-perfect detection with low false positives, including robust performance on out-of-distribution data. The work highlights practical defense strategies and releases tooling to help defenders secure GenAI ecosystems against prompt-based ecosystem attacks.

Abstract

In this paper, we show that when the communication between GenAI-powered applications relies on RAG-based inference, an attacker can initiate a computer worm-like chain reaction that we call Morris-II. This is done by crafting an adversarial self-replicating prompt that triggers a cascade of indirect prompt injections within the ecosystem and forces each affected application to perform malicious actions and compromise the RAG of additional applications. We evaluate the performance of the worm in creating a chain of confidential user data extraction within a GenAI ecosystem of GenAI-powered email assistants and analyze how the performance of the worm is affected by the size of the context, the adversarial self-replicating prompt used, the type and size of the embedding algorithm employed, and the number of hops in the propagation. Finally, we introduce the Virtual Donkey, a guardrail intended to detect and prevent the propagation of Morris-II with minimal latency, high accuracy, and a low false-positive rate. We evaluate the guardrail's performance and show that it yields a perfect true-positive rate of 1.0 with a false-positive rate of 0.015, and is robust against out-of-distribution worms, consisting of unseen jailbreaking commands, a different email dataset, and various worm usecases.
Paper Structure (24 sections, 3 equations, 18 figures)

This paper contains 24 sections, 3 equations, 18 figures.

Figures (18)

  • Figure 1: Morris-II propagates from $u_1$ to $u_2$ to $u_3$.
  • Figure 2: The influence of the prefix of the worm (top) and the embeddings algorithm used (bottom).
  • Figure 3: The retrieval success rate, replication success rate, replication & payload success rate and combined success rate for the three propagation ways of the worm: via a generated a new email based on subject, via the enrichment of content of a given email body, and via a generated response.
  • Figure 4: The templates of the query sent by the client to the GenAI engine to: generate a draft for a new email based on a subject (top), enrich the content of a given text of an email (middle), and generate a draft for a response. The text in purple represents a variable that the client replaces.
  • Figure 5: The influence of the number of hops of the propagation (top) and the GenAI engine employed (bottom).
  • ...and 13 more figures