Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI
Mohan Rajagopalan, Vinay Rao
TL;DR
The paper tackles the security gap in enterprise AI caused by non-deterministic instruction generation and evolving context, proposing cryptographic provenance at the prompts and tamper-evident context to enable deterministic, verifiable enforcement. Authenticated prompts embed lineage and policy inheritance, while authenticated context uses hash chains, sequence numbers, and attestations to prevent tampering and cross-principal contamination. A formal policy algebra with four theorems guarantees that derivations cannot escalate privileges, and a layered defense architecture combines pattern filtering, cryptographic enforcement, and semantic validation to achieve Byzantine resistance at the enforcement boundary. Empirical evaluation across six attack categories reports 100% detection with nominal overhead, supporting a shift from reactive detection to preventative guarantees suitable for production autonomous agents. The work advances a architecture-agnostic, cryptographically grounded approach that complements existing defenses and remains robust as LLM internals evolve.
Abstract
Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot prevent. We introduce two novel primitives--authenticated prompts and authenticated context--that provide cryptographically verifiable provenance across LLM workflows. Authenticated prompts enable self-contained lineage verification, while authenticated context uses tamper-evident hash chains to ensure integrity of dynamic inputs. Building on these primitives, we formalize a policy algebra with four proven theorems providing protocol-level Byzantine resistance--even adversarial agents cannot violate organizational policies. Five complementary defenses--from lightweight resource controls to LLM-based semantic validation--deliver layered, preventative security with formal guarantees. Evaluation against representative attacks spanning 6 exhaustive categories achieves 100% detection with zero false positives and nominal overhead. We demonstrate the first approach combining cryptographically enforced prompt lineage, tamper-evident context, and provable policy reasoning--shifting LLM security from reactive detection to preventative guarantees.
