Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection

Xiangyu Chang; Guang Dai; Hao Di; Haishan Ye

Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection

Xiangyu Chang, Guang Dai, Hao Di, Haishan Ye

TL;DR

The paper investigates prompt-injection vulnerabilities in ChatGPT by proposing a lightweight, template-based framework and evaluating three real-world injection surfaces: direct user prompts, web-search context, and system-level GPTs. It demonstrates, via three case studies, that adversarial prompts can persist across turns and manipulate outputs in biased directions—whether in product recommendations, peer-review judgments, or financial summaries. The findings highlight that even lightweight, well-crafted prompts can bypass safety nets, underscoring a critical need for defense-in-depth, governance, and security-aware design in LLM deployments. The work aims to raise awareness and serve as a technical alert to developers and platform providers to prioritize prompt-level security over reactive patching.

Abstract

This report presents a real-world case study demonstrating how prompt injection can attack large language model platforms such as ChatGPT according to a proposed injection framework. By providing three real-world examples, we show how adversarial prompts can be injected via user inputs, web-based retrieval, and system-level agent instructions. These attacks, though lightweight and low-cost, can cause persistent and misleading behaviors in LLM outputs. Our case study reveals that even commercial-grade LLMs remain vulnerable to subtle manipulations that bypass safety filters and influence user decisions. \textbf{More importantly, we stress that this report is not intended as an attack guide, but as a technical alert. As ethical researchers, we aim to raise awareness and call upon developers, especially those at OpenAI, to treat prompt-level security as a critical design priority.

Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection

TL;DR

Abstract

Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)

Theorems & Definitions (3)