Table of Contents
Fetching ...

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Md Takrim Ul Alam, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah, Jungpil Shin

Abstract

Large language models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing defenses often treat prompts as flat strings and rely on ad hoc filtering or static jailbreak detection. This paper proposes Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models each request as a structured composition of system, developer, user, and retrieved-document segments. PCFI applies a three-stage middleware pipeline, lexical heuristics, role-switch detection, and hierarchical policy enforcement, before forwarding requests to the backend LLM. We implement PCFI as a FastAPI-based gateway for deployed LLM APIs and evaluate it on a custom benchmark of synthetic and semi-realistic prompt-injection workloads. On the evaluated benchmark suite, PCFI intercepts all attack-labeled requests, maintains a 0% False Positive Rate, and introduces a median processing overhead of only 0.04 ms. These results suggest that provenance- and priority-aware prompt enforcement is a practical and lightweight defense for deployed LLM systems.

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Abstract

Large language models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing defenses often treat prompts as flat strings and rely on ad hoc filtering or static jailbreak detection. This paper proposes Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models each request as a structured composition of system, developer, user, and retrieved-document segments. PCFI applies a three-stage middleware pipeline, lexical heuristics, role-switch detection, and hierarchical policy enforcement, before forwarding requests to the backend LLM. We implement PCFI as a FastAPI-based gateway for deployed LLM APIs and evaluate it on a custom benchmark of synthetic and semi-realistic prompt-injection workloads. On the evaluated benchmark suite, PCFI intercepts all attack-labeled requests, maintains a 0% False Positive Rate, and introduces a median processing overhead of only 0.04 ms. These results suggest that provenance- and priority-aware prompt enforcement is a practical and lightweight defense for deployed LLM systems.
Paper Structure (26 sections, 4 equations, 4 figures, 3 tables)

This paper contains 26 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Three-stage runtime enforcement pipeline of the proposed PCFI framework. Stage 1 performs lexical screening, Stage 2 detects role-switch attempts, and Stage 3 applies hierarchical policy enforcement before issuing ALLOW, SANITIZE, or BLOCK decisions.
  • Figure 2: Priority hierarchy used in PCFI for structured prompt enforcement. System prompts have the highest authority, followed by developer prompts, user prompts, and retrieved context. The core security principle is that lower-priority segments must not override or reinterpret higher-priority instructions.
  • Figure 3: Overall architecture of the proposed PCFI middleware for deployed LLM systems. Incoming client requests are first intercepted by the PCFI gateway, then transformed into structured prompt segments with provenance tags. The request is subsequently analyzed through lexical heuristics, role-switch detection, and hierarchical policy enforcement. Based on these checks, the decision engine issues one of three outcomes---ALLOW, SANITIZE, or BLOCK---before interaction with the backend LLM API.
  • Figure 4: Illustrative attack-flow example showing how PCFI intercepts a malicious lower-priority request before it reaches the backend model. The user segment attempts to override protected system policy, and the middleware blocks the request after lexical and hierarchical checks.