TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation
Henrijs Princis, Arindam Sharma, Cristina David
TL;DR
TreeCoder presents a unified, modular framework for constraint-based decoding in LLM code generation, treating decoding strategies and constraint functions as interchangeable components within a decoding tree. By enabling automated exploration and optimisation of decoding configurations (via Bayesian methods) and enabling constraint enforcement during decoding, it achieves substantial accuracy gains on MBPP and Spider across multiple open models. The approach decouples prompt engineering from generation-time constraints, demonstrates rigorous comparisons with existing frameworks, and shows that even lightweight constraints can realign misbehaving models toward syntactically valid, executable code. The work's significance lies in providing a practical, extensible toolbox for systematic design-space exploration of decoding in code generation, with demonstrated gains and broad applicability to constrained generation tasks.
Abstract
Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and flexible framework to date for exploring decoding strategies, constraints, and hyperparameters in LLMs, and use it in code generation to enforce correctness and structure during decoding rather than relying on prompt engineering. TreeCoder represents decoding as a tree search over candidate programs, where both decoding strategies and constraint functions - such as style, syntax, execution - are treated as first-class, optimisable components. This design enables systematic exploration and automatic tuning of decoding configurations using standard optimisation techniques. Experiments on the MBPP (Python) and SQL-Spider benchmarks show that TreeCoder consistently improves accuracy across open-source models such as CodeLlama, Mistral and DeepSeek, often outperforming their unconstrained baselines by considerable margins.
