Intrinsic Verification of Parsers and Formal Grammar Theory in Dependent Lambek Calculus (Extended Version)
Steven Schaefer, Nathan Varner, Pedro H. Azevedo de Amorim, Max S. New
TL;DR
This work introduces Dependent Lambek Calculus (Lambek^D), a dependent, non-commutative linear-type theory that encodes grammars as linear types and parses as linear terms, enabling intrinsic correctness guarantees for parsers. It demonstrates that diverse grammar formalisms—finite grammars, regular expressions, nondeterministic and deterministic automata, and even context-free and unrestricted grammars—can be represented and manipulated within a single, extensible framework, with a denotational semantics tying grammar theory to parse transformers. A constructive Agda prototype provides machine-checked realizations of these encodings, together with a formal semantics that validates the equational theory and the soundness of parsing operations. The approach offers a unified, verifiable foundation for parsing across multiple grammar formalisms, paving the way for verified parser generators and secure semantic actions in real-world software systems.
Abstract
We present Dependent Lambek Calculus, a domain-specific dependent type theory for verified parsing and formal grammar theory. In $\textrm{Lambek}^D$, linear types are used as a syntax for formal grammars,and parsers can be written as linear terms. The linear typing restriction provides a form of intrinsic verification that a parser yields only valid parse trees for the input string. We demonstrate the expressivity of this system by showing that the combination of inductive linear types and dependency on non-linear data can be used to encode commonly used grammar formalisms such as regular and context-free grammars as well as traces of various types of automata. Using these encodings, we define parsers for regular expressions using deterministic automata, as well as examples of verified parsers of context-free grammars. We present a denotational semantics of our type theory that interprets the linear types as functions from strings to sets of abstract parse trees and terms as parse transformers. Based on this denotational semantics, we have made a prototype implementation of $\textrm{Lambek}^D$ using a shallow embedding in the Agda proof assistant. All of our examples parsers have been implemented in this prototype implementation.
