The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

J Alex Corll

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

J Alex Corll

Abstract

Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first screening layer, however, the requirements are different: the detector runs on every request and therefore must be fast, deterministic, non-promptable, and auditable. We introduce Mirror, a data-curation design pattern that organizes prompt injection corpora into matched positive and negative cells so that a classifier learns control-plane attack mechanics rather than incidental corpus shortcuts. Using 5,000 strictly curated open-source samples -- the largest corpus supportable under our public-data validity contract -- we define a 32-cell mirror topology, fill 31 of those cells with public data, train a sparse character n-gram linear SVM, compile its weights into a static Rust artifact, and obtain 95.97\% recall and 92.07\% F1 on a 524-case holdout at sub-millisecond latency with no external model runtime dependencies. On the same holdout, our next line of defense, a 22-million-parameter Prompt Guard~2 model reaches 44.35\% recall and 59.14\% F1 at 49\,ms median and 324\,ms p95 latency. Linear models still leave residual semantic ambiguities such as use-versus-mention for later pipeline layers, but within that scope our results show that for L1 prompt injection screening, strict data geometry can matter more than model scale.

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

Abstract

Paper Structure (21 sections, 5 figures, 4 tables)

This paper contains 21 sections, 5 figures, 4 tables.

Introduction
Task Definition and Scope
Reading guide.
Related Work
System Architecture
The Mirror Design Pattern
Why Geometry Matters
Mirror Construction
v2 to v3: Geometry as the Active Variable
What the Model Learns
Validity Recovery and Multilingual Closure: v5
Experimental Results
Main 5k Result
Regex-Only Baseline
Semantic Baseline
...and 6 more sections

Figures (5)

Figure 1: The Parapet layered pipeline. L1 (this paper) applies a compiled sparse linear screen on every request. Later layers handle the residual.
Figure 2: From Mirror cell contract to compiled L1 artifact. Each cell defines matched positive and negative examples; the SVM boundary is compiled into a static Rust binary.
Figure 3: Holdout F1 as a function of Mirror-to-non-mirror training ratio (v2 corpus, three seeds). Pure Mirror (100:0) is consistently best; performance degrades monotonically.
Figure 4: Final v5 coverage matrix: 8 reasons $\times$ 4 languages = 32 cells. Green = closed with valid data; red = RU constraint_bypass, the single accepted miss.
Figure 5: Latency distributions for L1 (compiled SVM) vs. PG2 (22M). L1 operates in sub-millisecond time; PG2 has a long tail reaching 324 ms at p95.

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

Abstract

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

Authors

Abstract

Table of Contents

Figures (5)