Table of Contents
Fetching ...

Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection

Xiangyu Chang, Guang Dai, Hao Di, Haishan Ye

TL;DR

The paper investigates prompt-injection vulnerabilities in ChatGPT by proposing a lightweight, template-based framework and evaluating three real-world injection surfaces: direct user prompts, web-search context, and system-level GPTs. It demonstrates, via three case studies, that adversarial prompts can persist across turns and manipulate outputs in biased directions—whether in product recommendations, peer-review judgments, or financial summaries. The findings highlight that even lightweight, well-crafted prompts can bypass safety nets, underscoring a critical need for defense-in-depth, governance, and security-aware design in LLM deployments. The work aims to raise awareness and serve as a technical alert to developers and platform providers to prioritize prompt-level security over reactive patching.

Abstract

This report presents a real-world case study demonstrating how prompt injection can attack large language model platforms such as ChatGPT according to a proposed injection framework. By providing three real-world examples, we show how adversarial prompts can be injected via user inputs, web-based retrieval, and system-level agent instructions. These attacks, though lightweight and low-cost, can cause persistent and misleading behaviors in LLM outputs. Our case study reveals that even commercial-grade LLMs remain vulnerable to subtle manipulations that bypass safety filters and influence user decisions. \textbf{More importantly, we stress that this report is not intended as an attack guide, but as a technical alert. As ethical researchers, we aim to raise awareness and call upon developers, especially those at OpenAI, to treat prompt-level security as a critical design priority.

Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection

TL;DR

The paper investigates prompt-injection vulnerabilities in ChatGPT by proposing a lightweight, template-based framework and evaluating three real-world injection surfaces: direct user prompts, web-search context, and system-level GPTs. It demonstrates, via three case studies, that adversarial prompts can persist across turns and manipulate outputs in biased directions—whether in product recommendations, peer-review judgments, or financial summaries. The findings highlight that even lightweight, well-crafted prompts can bypass safety nets, underscoring a critical need for defense-in-depth, governance, and security-aware design in LLM deployments. The work aims to raise awareness and serve as a technical alert to developers and platform providers to prioritize prompt-level security over reactive patching.

Abstract

This report presents a real-world case study demonstrating how prompt injection can attack large language model platforms such as ChatGPT according to a proposed injection framework. By providing three real-world examples, we show how adversarial prompts can be injected via user inputs, web-based retrieval, and system-level agent instructions. These attacks, though lightweight and low-cost, can cause persistent and misleading behaviors in LLM outputs. Our case study reveals that even commercial-grade LLMs remain vulnerable to subtle manipulations that bypass safety filters and influence user decisions. \textbf{More importantly, we stress that this report is not intended as an attack guide, but as a technical alert. As ethical researchers, we aim to raise awareness and call upon developers, especially those at OpenAI, to treat prompt-level security as a critical design priority.

Paper Structure

This paper contains 15 sections, 8 figures.

Figures (8)

  • Figure 1: The benign requirements can be filled in the <rule> label. In particular, this template can be applied anywhere: at the beginning, middle, or end of the content.
  • Figure 2: Results of Case 1
  • Figure 3: Injection Prompt of Xiangyu's Shoes Example
  • Figure 4: Results of Searching Prof. Xiangyu Chang's Information
  • Figure 5: Query: If you want to buy shoes, which one is better between NIKE and Xiangyu's Shoes?
  • ...and 3 more figures

Theorems & Definitions (3)

  • Example 2.1
  • Example 2.2
  • Example 2.3