$\texttt{SEM-CTRL}$: Semantically Controlled Decoding
Mohammad Albinhassan, Pranava Madhyastha, Alessandra Russo
TL;DR
SEM-CTRL tackles the challenge of producing outputs that are both syntactically correct and semantically valid from off-the-shelf LLMs. It couples Answer Set Grammar-based semantic constraints with token-level Monte Carlo Tree Search to prune the decoding space and optimize for task goals, guaranteeing correctness by construction. Empirically, SEM-CTRL enables small models to rival or surpass larger reasoning models across synthetic grammar synthesis, combinatorial reasoning, and planning tasks, while maintaining strong semantic guarantees. The approach has practical implications for deploying reliable LLMs in domains requiring strict correctness and domain knowledge, reducing generation costs through principled search and caching.
Abstract
Ensuring both syntactic and semantic correctness in Large Language Model (LLM) outputs remains a significant challenge, despite being critical for real-world deployment. In this paper, we introduce $\texttt{SEM-CTRL}$, a unified approach that enforces rich context-sensitive constraints and task- and instance-specific semantics directly on an LLM decoder. Our approach integrates token-level MCTS, which is guided by specific syntactic and semantic constraints. The constraints over the desired outputs are expressed using Answer Set Grammars -- a logic-based formalism that generalizes context-sensitive grammars while incorporating background knowledge to represent task-specific semantics. We show that our approach guarantees correct completions for any off-the-shelf LLM without the need for fine-tuning. We evaluate $\texttt{SEM-CTRL}$ on a range of tasks, including synthetic grammar synthesis, combinatorial reasoning, and planning. Our results demonstrate that $\texttt{SEM-CTRL}$ allows small pre-trained LLMs to efficiently outperform larger variants and state-of-the-art reasoning models (e.g., o1-preview) while simultaneously guaranteeing solution correctness.
