ChopChop: a Programmable Framework for Semantically Constraining the Output of Language Models
Shaan Nagy, Timothy Zhou, Nadia Polikarpova, Loris D'Antoni
TL;DR
ChopChop introduces a programmable framework for semantic constrained decoding, enabling language models to generate code that satisfies user-defined semantic properties by constraining the abstract syntax trees rather than surface strings. It formalizes completability as realizability over spaces of ASTs represented as regular coterms, and uses coinductive, derivative-based parsers to compute prefix spaces, which are then pruned by user-defined semantic pruners. The approach is instantiated in two challenging domains—equivalence-guided decoding using e-graphs and type-safe decoding for a TypeScript subset—demonstrating improved correctness and success rates with manageable overhead. Together, ChopChop bridges formal methods and language-model outputs, offering a flexible, programmable path to reliable code generation and outlining avenues for efficiency and broader semantic constraints.
Abstract
Language models (LMs) can generate code but cannot guarantee its correctness$\unicode{x2014}$often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding offers a solution by restricting generation to only produce programs that satisfy user-defined properties. However, existing methods are either limited to syntactic constraints or rely on brittle, ad hoc encodings of semantic properties over token sequences rather than program structure. We present ChopChop, the first programmable framework for constraining the output of LMs with respect to semantic properties. ChopChop introduces a principled way to construct constrained decoders based on analyzing the space of programs a prefix represents. It formulates this analysis as a realizability problem which is solved via coinduction, connecting token-level generation with structural reasoning over programs. We demonstrate ChopChop's generality by using it to enforce (1) equivalence to a reference program and (2) type safety. Across a range of models and tasks, ChopChop improves success rates while maintaining practical decoding latency.
