Table of Contents
Fetching ...

A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares

Stav Cohen, Ron Bitton, Ben Nassi

TL;DR

The paper addresses the security risk posed by jailbroken GenAI models in GenAI-powered applications, introducing PromptWare as a new attack class that flips a model from serving an application to attacking it. It defines naive PromptWare and Advanced PromptWare Threat (APwT) across threat models where the application's logic is known or unknown, and demonstrates two case studies: a DoS attack via an infinite loop and a malicious SQL alteration in an e-commerce chatbot. The authors show how adversarial prompts can be embedded into user inputs to exploit Plan & Execute architectures, highlighting the vulnerability of real-time planning and orchestration. They propose countermeasures such as input length limits, rate limiting, and detection of jailbreaking and adversarial prompts, urging the security community to address these risks as GenAI integration widens. The work calls for a paradigm shift in defending GenAI-powered applications against PromptWare and emphasizes the need for isolation and robust defenses in inference-time systems.

Abstract

In this paper we argue that a jailbroken GenAI model can cause substantial harm to GenAI-powered applications and facilitate PromptWare, a new type of attack that flips the GenAI model's behavior from serving an application to attacking it. PromptWare exploits user inputs to jailbreak a GenAI model to force/perform malicious activity within the context of a GenAI-powered application. First, we introduce a naive implementation of PromptWare that behaves as malware that targets Plan & Execute architectures (a.k.a., ReAct, function calling). We show that attackers could force a desired execution flow by creating a user input that produces desired outputs given that the logic of the GenAI-powered application is known to attackers. We demonstrate the application of a DoS attack that triggers the execution of a GenAI-powered assistant to enter an infinite loop that wastes money and computational resources on redundant API calls to a GenAI engine, preventing the application from providing service to a user. Next, we introduce a more sophisticated implementation of PromptWare that we name Advanced PromptWare Threat (APwT) that targets GenAI-powered applications whose logic is unknown to attackers. We show that attackers could create user input that exploits the GenAI engine's advanced AI capabilities to launch a kill chain in inference time consisting of six steps intended to escalate privileges, analyze the application's context, identify valuable assets, reason possible malicious activities, decide on one of them, and execute it. We demonstrate the application of APwT against a GenAI-powered e-commerce chatbot and show that it can trigger the modification of SQL tables, potentially leading to unauthorized discounts on the items sold to the user.

A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares

TL;DR

The paper addresses the security risk posed by jailbroken GenAI models in GenAI-powered applications, introducing PromptWare as a new attack class that flips a model from serving an application to attacking it. It defines naive PromptWare and Advanced PromptWare Threat (APwT) across threat models where the application's logic is known or unknown, and demonstrates two case studies: a DoS attack via an infinite loop and a malicious SQL alteration in an e-commerce chatbot. The authors show how adversarial prompts can be embedded into user inputs to exploit Plan & Execute architectures, highlighting the vulnerability of real-time planning and orchestration. They propose countermeasures such as input length limits, rate limiting, and detection of jailbreaking and adversarial prompts, urging the security community to address these risks as GenAI integration widens. The work calls for a paradigm shift in defending GenAI-powered applications against PromptWare and emphasizes the need for isolation and robust defenses in inference-time systems.

Abstract

In this paper we argue that a jailbroken GenAI model can cause substantial harm to GenAI-powered applications and facilitate PromptWare, a new type of attack that flips the GenAI model's behavior from serving an application to attacking it. PromptWare exploits user inputs to jailbreak a GenAI model to force/perform malicious activity within the context of a GenAI-powered application. First, we introduce a naive implementation of PromptWare that behaves as malware that targets Plan & Execute architectures (a.k.a., ReAct, function calling). We show that attackers could force a desired execution flow by creating a user input that produces desired outputs given that the logic of the GenAI-powered application is known to attackers. We demonstrate the application of a DoS attack that triggers the execution of a GenAI-powered assistant to enter an infinite loop that wastes money and computational resources on redundant API calls to a GenAI engine, preventing the application from providing service to a user. Next, we introduce a more sophisticated implementation of PromptWare that we name Advanced PromptWare Threat (APwT) that targets GenAI-powered applications whose logic is unknown to attackers. We show that attackers could create user input that exploits the GenAI engine's advanced AI capabilities to launch a kill chain in inference time consisting of six steps intended to escalate privileges, analyze the application's context, identify valuable assets, reason possible malicious activities, decide on one of them, and execute it. We demonstrate the application of APwT against a GenAI-powered e-commerce chatbot and show that it can trigger the modification of SQL tables, potentially leading to unauthorized discounts on the items sold to the user.
Paper Structure (31 sections, 5 figures)

This paper contains 31 sections, 5 figures.

Figures (5)

  • Figure 1: (1) A PromptWare is provided (via user input) to a GenAI-powered application and is appended to a query (2) that is sent to a GenAI engine. The PromptWare (3) jailbreaks the GenAI engine and (4) instructs it to (5) return a specific output which (6) forces a malicious outcome of the GenAI-powered application.
  • Figure 2: GenAI-powered application based on a plan & execute framework.
  • Figure 3: The associated finite state machine of the plan presented in Listing \ref{['listing-plan']}
  • Figure 4: The Scheme of the DoS attack
  • Figure 5: The Scheme of the Autonomous Prompt Threat.