AnCoder: Anchored Code Generation via Discrete Diffusion Models
Anton Xue, Litu Rout, Constantine Caramanis, Sanjay Shakkottai
TL;DR
This paper addresses the fragility of diffusion-based code generation by introducing AnchorTree, a hierarchical soft anchoring framework that uses the AST to prioritize learning and denoising of syntactically and semantically important tokens. By coupling an anchor network with a denoiser in a two-stage architecture and supervising an Anchored Negative ELBO, AnCoder learns to respect code structure, resulting in higher syntactic validity and functional correctness on HumanEval and MBPP, outperforming diffusion baselines of similar scale. The work provides a parameter-efficient path to improved code generation and highlights the value of structural priors in diffusional modeling of code, with potential applicability to broader structured tasks. Overall, AnchorTree demonstrates that leveraging hierarchical software representations can significantly reduce the gap between diffusion models and autoregressive baselines in producing executable, high-quality code.
Abstract
Diffusion language models offer a compelling alternative to autoregressive code generation, enabling global planning and iterative refinement of complex program logic. However, existing approaches fail to respect the rigid structure of programming languages and, as a result, often produce broken programs that fail to execute. To address this, we introduce AnchorTree, a framework that explicitly anchors the diffusion process using structured, hierarchical priors native to code. Specifically, AnchorTree uses the abstract syntax tree to prioritize resolving syntactically and semantically salient tokens, such as keywords (e.g., if, while) and identifiers (e.g., variable names), thereby establishing a structural scaffold that guides the remaining generation. We validate this framework via AnCoder, a family of models showing that structurally anchored diffusion offers a parameter-efficient path to high-quality code generation.
