Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

Faruk Alpay; Bilge Senturk

Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

Faruk Alpay, Bilge Senturk

TL;DR

An oracle invariance theorem is proved: language-equivalent grammars induce identical admissible next-token sets for every prefix, hence identical logit masks, yet can yield provably different compiled state spaces and online ambiguity costs.

Abstract

We study grammar-constrained decoding (GCD) as a coupling between an autoregressive next-token distribution and a reachability oracle over a pushdown system compiled from a context-free grammar (CFG). We prove an oracle invariance theorem: language-equivalent grammars induce identical admissible next-token sets for every prefix, hence identical logit masks, yet can yield provably different compiled state spaces and online ambiguity costs. We give exact control-state blowup counts for the canonical $a^n b^n$ language under redundant nonterminal delegation, and introduce a left-to-right structural ambiguity cost (SAC) measuring incremental packed-parse-forest growth per token. For two equivalent grammars over all finite strings, SAC is $O(1)$ per token under right-recursion but $Θ(t^2)$ per token and $Θ(n^3)$ cumulatively under concatenation. We establish engine-independent lower bounds: any sound, retrieval-efficient, parse-preserving online masking engine must incur $Ω(t^2)$ work per token on a specific constant-size CFG family, unconditionally within this model. We define decoding-cost equivalence classes of grammars and prove existence of minimal-SAC representatives within bounded rewrite families. Finally, we characterize the true conditional sampler via a Doob $h$-transform and derive sharp one-step KL and total-variation distortion bounds for hard-masked decoding in terms of survival-probability spread among admissible next tokens. We integrate these results with Transformer and Mixture-of-Experts architectures, derive latency envelopes in terms of vocabulary size, active state sets, and beam width, and connect SAC to instrumentation-based predictive performance models and automated grammar optimization.

Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

TL;DR

Abstract

language under redundant nonterminal delegation, and introduce a left-to-right structural ambiguity cost (SAC) measuring incremental packed-parse-forest growth per token. For two equivalent grammars over all finite strings, SAC is

per token under right-recursion but

per token and

cumulatively under concatenation. We establish engine-independent lower bounds: any sound, retrieval-efficient, parse-preserving online masking engine must incur

work per token on a specific constant-size CFG family, unconditionally within this model. We define decoding-cost equivalence classes of grammars and prove existence of minimal-SAC representatives within bounded rewrite families. Finally, we characterize the true conditional sampler via a Doob

-transform and derive sharp one-step KL and total-variation distortion bounds for hard-masked decoding in terms of survival-probability spread among admissible next tokens. We integrate these results with Transformer and Mixture-of-Experts architectures, derive latency envelopes in terms of vocabulary size, active state sets, and beam width, and connect SAC to instrumentation-based predictive performance models and automated grammar optimization.

Paper Structure (43 sections, 25 theorems, 56 equations)

This paper contains 43 sections, 25 theorems, 56 equations.

Introduction
Contributions.
Decoding as Pushdown Reachability
Languages, prefixes, and tokenization
Pushdown automata and CFG compilation
Engine semantics for left-to-right masking
Structural Equivalence, Masked Decoding, and Probability Constraints
Oracle invariance under language equivalence
Masked decoding as a constrained stochastic process
Formal State-Space Blowup from Nonterminal Delegation
Static control-state counts
Online consequences for masking engines
Structural Ambiguity Cost and Parse Forest Density
Two equivalent grammars for $\Sigma^*$
Parse trees and Catalan ambiguity
...and 28 more sections

Key Result

Proposition 1

For every CFG $G$, the compiled NPDA $\mathcal{A}_G$ accepts exactly $\mathcal{L}(G)$.

Theorems & Definitions (94)

Definition 1: CFG
Definition 2: Prefix closure and one-step extension
Definition 3: Tokenizer homomorphism
Definition 4: Nondeterministic PDA
Definition 5: Recursive-transition-network compilation
Proposition 1: Correctness of compilation
proof
Definition 6: Reachability and liveness
Definition 7: Admissible next vocabulary tokens
Remark 1: Reachability as the computational bottleneck
...and 84 more

Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

TL;DR

Abstract

Attention Meets Reachability: Structural Equivalence and Efficiency in Grammar-Constrained LLM Decoding

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (94)