Synthesizing Formal Semantics from Executable Interpreters
Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J. C. Johnson, Thomas Reps, Loris D'Antoni
TL;DR
This work presents a practical method to automatically synthesize formal, CHC-based semantics for a language from an existing interpreter, removing the burden of hand-writing semantics. Using a per-production, inductive synthesis with CEGIS, the approach constructs a SemGuS-compatible semantics that is equivalent to the given interpreter, enabling verification and synthesis workflows on real languages. The authors implement Synantic, demonstrate scalability across a broad benchmark suite, and reveal both correctness gains and performance bottlenecks, including a bug in a RegEx semantics. The results show the approach can synthesize non-trivial language semantics and often matches manually written CHCs, with multi-output optimization improving scalability for more complex productions, thereby facilitating practical semantic formalization for verification tools.
Abstract
Program verification and synthesis frameworks that allow one to customize the language in which one is interested typically require the user to provide a formally defined semantics for the language. Because writing a formal semantics can be a daunting and error-prone task, this requirement stands in the way of such frameworks being adopted by non-expert users. We present an algorithm that can automatically synthesize inductively defined syntax-directed semantics when given (i) a grammar describing the syntax of a language and (ii) an executable (closed-box) interpreter for computing the semantics of programs in the language of the grammar. Our algorithm synthesizes the semantics in the form of Constrained-Horn Clauses (CHCs), a natural, extensible, and formal logical framework for specifying inductively defined relations that has recently received widespread adoption in program verification and synthesis. The key innovation of our synthesis algorithm is a Counterexample-Guided Synthesis (CEGIS) approach that breaks the hard problem of synthesizing a set of constrained Horn clauses into small, tractable expression-synthesis problems that can be dispatched to existing SyGuS synthesizers. Our tool Synantic synthesized inductively-defined formal semantics from 14 interpreters for languages used in program-synthesis applications. When synthesizing formal semantics for one of our benchmarks, Synantic unveiled an inconsistency in the semantics computed by the interpreter for a language of regular expressions; fixing the inconsistency resulted in a more efficient semantics and, for some cases, in a 1.2x speedup for a synthesizer solving synthesis problems over such a language.
