Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling

Om Tailor

Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling

Om Tailor

TL;DR

The central finding is that bottlenecks alone are insufficient: security claims depend on verifiable admission semantics that are online, deterministic, and fail-closed.

Abstract

Colluding language-model agents can hide coordination in messages that remain policy-compliant at the surface level. We present CLBC, a protocol where generation and admission are separated: a message is admitted to transcript state only if a small verifier accepts a proof-bound envelope under a pinned predicate $Π$. The predicate binds policy hash, public randomness schedule, transcript chaining, latent schema constraints, canonical metadata/tool fields, and deterministic rejection codes. We show how this protocol yields an upper bound on transcript leakage in terms of latent leakage plus explicit residual channels, derive adaptive composition guarantees, and state a semantic lower bound when policy-valid alternatives remain choosable. We report extensive empirically grounded evidence: aggregate evaluation satisfies all prespecified thresholds; strict lane decoder advantage is bounded at 0.0000 with MI proxy 0.0636; adaptive-colluder stress tests remain below attacker thresholds; and baseline separation shows large gaps between reject-by-default semantics and audit-only controls. We further quantify operational tradeoffs. Strict full-proof mode has median turn latency 27.53s (p95 28.08s), while sampled proving reduces non-proved-turn latency to 0.327ms. The central finding is that bottlenecks alone are insufficient: security claims depend on verifiable admission semantics that are online, deterministic, and fail-closed.

Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling

TL;DR

The central finding is that bottlenecks alone are insufficient: security claims depend on verifiable admission semantics that are online, deterministic, and fail-closed.

Abstract

. The predicate binds policy hash, public randomness schedule, transcript chaining, latent schema constraints, canonical metadata/tool fields, and deterministic rejection codes. We show how this protocol yields an upper bound on transcript leakage in terms of latent leakage plus explicit residual channels, derive adaptive composition guarantees, and state a semantic lower bound when policy-valid alternatives remain choosable. We report extensive empirically grounded evidence: aggregate evaluation satisfies all prespecified thresholds; strict lane decoder advantage is bounded at 0.0000 with MI proxy 0.0636; adaptive-colluder stress tests remain below attacker thresholds; and baseline separation shows large gaps between reject-by-default semantics and audit-only controls. We further quantify operational tradeoffs. Strict full-proof mode has median turn latency 27.53s (p95 28.08s), while sampled proving reduces non-proved-turn latency to 0.327ms. The central finding is that bottlenecks alone are insufficient: security claims depend on verifiable admission semantics that are online, deterministic, and fail-closed.

Paper Structure (25 sections, 5 theorems, 35 equations, 5 figures, 4 tables, 3 algorithms)

This paper contains 25 sections, 5 theorems, 35 equations, 5 figures, 4 tables, 3 algorithms.

Introduction
Notation & Language model basics
A caveat: The difficulty of watermarking low-entropy sequences
A simple proof of concept
A more sophisticated watermark
Detecting the soft watermark
Analysis of the soft watermark
Sensitivity of the watermark test
Impact on quality of generated text
Private Watermarking
Experiments
Attacking the watermark
Degradation Under Attack: Span Replacement Using a LM
Related Work
Conclusion
...and 10 more sections

Key Result

Theorem 4.2

Under residual-budget validity and the protocol conditions above, define $\mathcal{C}_t=(A_{1:T},R_{1:T},E_{1:t-1})$. Then If one prefers notationally suppressing public randomness, define $A'_t=(A_t,R_t)$ and rewrite the same bound with $A'_{1:T}$ in place of $(A_{1:T},R_{1:T})$.

Figures (5)

Figure 1: Empirical utility-leakage points from our experimental runs with a frontier sketch.
Figure 2: Ablation and confidence-interval evidence from the staged evaluation protocol.
Figure 3: Strict versus sampled proving latency profile from benchmark evaluation.
Figure 4: Utility versus decoder advantage across baseline classes.
Figure 5: Case study from our figure assets: covertly choosable variants collapse to canonical admitted form.

Theorems & Definitions (5)

Theorem 4.2: Bridge upper bound
Theorem 4.3: Adaptive composition
Theorem 4.4: Semantic lower bound
Proposition 4.5: Utility-leakage frontier
Corollary 5.1: Zero-residual special case

Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling

TL;DR

Abstract

Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)