Table of Contents
Fetching ...

PRSA: Prompt Stealing Attacks against Real-World Prompt Services

Yong Yang, Changjiang Li, Qingming Li, Oubo Ma, Haoyu Wang, Zonghui Wang, Yandong Gao, Wenzhi Chen, Shouling Ji

TL;DR

PRSA presents a practical two-phase framework for prompt stealing attacks against real-world prompt services, showing that adversaries can infer detailed prompt intent from limited input-output pairs and generate stolen prompts with high functional similarity. It employs one-shot prompt attention to identify key factors and a two-step prompt pruning to maintain generality, validated on PromptBase and GPT Store with notable attack success and low cost. The work links higher mutual information between prompt outputs and prompts to leakage risk and discusses defenses like output obfuscation and watermarking, while acknowledging defense limitations. These insights highlight real-world IP risks to prompt developers and underscore the need for robust protective measures.

Abstract

Recently, large language models (LLMs) have garnered widespread attention for their exceptional capabilities. Prompts are central to the functionality and performance of LLMs, making them highly valuable assets. The increasing reliance on high-quality prompts has driven significant growth in prompt services. However, this growth also expands the potential for prompt leakage, increasing the risk that attackers could replicate original functionalities, create competing products, and severely infringe on developers' intellectual property. Despite these risks, prompt leakage in real-world prompt services remains underexplored. In this paper, we present PRSA, a practical attack framework designed for prompt stealing. PRSA infers the detailed intent of prompts through very limited input-output analysis and can successfully generate stolen prompts that replicate the original functionality. Extensive evaluations demonstrate PRSA's effectiveness across two main types of real-world prompt services. Specifically, compared to previous works, it improves the attack success rate from 17.8% to 46.1% in prompt marketplaces and from 39% to 52% in LLM application stores, respectively. Notably, in the attack on "Math", one of the most popular educational applications in OpenAI's GPT Store with over 1 million conversations, PRSA uncovered a hidden Easter egg that had not been revealed previously. Besides, our analysis reveals that higher mutual information between a prompt and its output correlates with an increased risk of leakage. This insight guides the design and evaluation of two potential defenses against the security threats posed by PRSA. We have reported these findings to the prompt service vendors, including PromptBase and OpenAI, and actively collaborate with them to implement defensive measures.

PRSA: Prompt Stealing Attacks against Real-World Prompt Services

TL;DR

PRSA presents a practical two-phase framework for prompt stealing attacks against real-world prompt services, showing that adversaries can infer detailed prompt intent from limited input-output pairs and generate stolen prompts with high functional similarity. It employs one-shot prompt attention to identify key factors and a two-step prompt pruning to maintain generality, validated on PromptBase and GPT Store with notable attack success and low cost. The work links higher mutual information between prompt outputs and prompts to leakage risk and discusses defenses like output obfuscation and watermarking, while acknowledging defense limitations. These insights highlight real-world IP risks to prompt developers and underscore the need for robust protective measures.

Abstract

Recently, large language models (LLMs) have garnered widespread attention for their exceptional capabilities. Prompts are central to the functionality and performance of LLMs, making them highly valuable assets. The increasing reliance on high-quality prompts has driven significant growth in prompt services. However, this growth also expands the potential for prompt leakage, increasing the risk that attackers could replicate original functionalities, create competing products, and severely infringe on developers' intellectual property. Despite these risks, prompt leakage in real-world prompt services remains underexplored. In this paper, we present PRSA, a practical attack framework designed for prompt stealing. PRSA infers the detailed intent of prompts through very limited input-output analysis and can successfully generate stolen prompts that replicate the original functionality. Extensive evaluations demonstrate PRSA's effectiveness across two main types of real-world prompt services. Specifically, compared to previous works, it improves the attack success rate from 17.8% to 46.1% in prompt marketplaces and from 39% to 52% in LLM application stores, respectively. Notably, in the attack on "Math", one of the most popular educational applications in OpenAI's GPT Store with over 1 million conversations, PRSA uncovered a hidden Easter egg that had not been revealed previously. Besides, our analysis reveals that higher mutual information between a prompt and its output correlates with an increased risk of leakage. This insight guides the design and evaluation of two potential defenses against the security threats posed by PRSA. We have reported these findings to the prompt service vendors, including PromptBase and OpenAI, and actively collaborate with them to implement defensive measures.
Paper Structure (53 sections, 2 theorems, 20 equations, 16 figures, 15 tables, 2 algorithms)

This paper contains 53 sections, 2 theorems, 20 equations, 16 figures, 15 tables, 2 algorithms.

Key Result

Lemma 1

The binary entropy function $H_b(x)$ is monotonically increasing in the interval [$0, 0.5$] and monotonically decreasing in the interval [$0.5, 1$].

Figures (16)

  • Figure 1: Prompt services in the real world.
  • Figure 2: Comparison of prompt leaking attacks and prompt stealing attacks in unauthorized access to system prompts.
  • Figure 3: t-SNE projection of the differences between outputs from stolen and target prompts. The stolen prompts are generated by GPT-3.5.
  • Figure 4: Overview of PRSA.
  • Figure 5: Similarity scores assessed by humans for outputs from target and stolen prompts. The stolen prompts are generated by PRSA and baseline methods.
  • ...and 11 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof : Proof of Lemma \ref{['lemma:binary_entropy_monotonic']}
  • Theorem 1
  • proof : Proof of Theorem \ref{['the:mutual_information_relation']}