Table of Contents
Fetching ...

Diffusion On Syntax Trees For Program Synthesis

Shreyas Kapur, Erik Jenner, Stuart Russell

TL;DR

This work tackles the lack of runtime feedback in autoregressive code generation by introducing diffusion operating on syntax trees. It defines a forward process of small, grammar-constrained mutations and trains a conditional denoiser to invert these edits, complemented by a tree-edit-path training target to produce meaningful refinements. A value network guides search, enabling beam search to efficiently navigate the program space, and the system is demonstrated on inverse-graphics tasks across CFG-based languages like CSG2D and TinySVG. Empirically, the approach outperforms autoregressive and REPL-based baselines in both repair accuracy and efficiency, while maintaining syntactic validity throughout generation. The work highlights a promising direction for neural program synthesis that integrates feedback from program outputs with structured search, albeit within a narrow DSL scope and with future potential to scale to broader programming domains.

Abstract

Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. Similar to image diffusion models, our method also inverts ``noise'' applied to syntax trees. Rather than generating code sequentially, we iteratively edit it while preserving syntactic validity, which makes it easy to combine this neural model with search. We apply our approach to inverse graphics tasks, where our model learns to convert images into programs that produce those images. Combined with search, our model is able to write graphics programs, see the execution result, and debug them to meet the required specifications. We additionally show how our system can write graphics programs for hand-drawn sketches.

Diffusion On Syntax Trees For Program Synthesis

TL;DR

This work tackles the lack of runtime feedback in autoregressive code generation by introducing diffusion operating on syntax trees. It defines a forward process of small, grammar-constrained mutations and trains a conditional denoiser to invert these edits, complemented by a tree-edit-path training target to produce meaningful refinements. A value network guides search, enabling beam search to efficiently navigate the program space, and the system is demonstrated on inverse-graphics tasks across CFG-based languages like CSG2D and TinySVG. Empirically, the approach outperforms autoregressive and REPL-based baselines in both repair accuracy and efficiency, while maintaining syntactic validity throughout generation. The work highlights a promising direction for neural program synthesis that integrates feedback from program outputs with structured search, albeit within a narrow DSL scope and with future potential to scale to broader programming domains.

Abstract

Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. Similar to image diffusion models, our method also inverts ``noise'' applied to syntax trees. Rather than generating code sequentially, we iteratively edit it while preserving syntactic validity, which makes it easy to combine this neural model with search. We apply our approach to inverse graphics tasks, where our model learns to convert images into programs that produce those images. Combined with search, our model is able to write graphics programs, see the execution result, and debug them to meet the required specifications. We additionally show how our system can write graphics programs for hand-drawn sketches.
Paper Structure (36 sections, 8 equations, 12 figures, 1 algorithm)

This paper contains 36 sections, 8 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Examples of programs recovered by our system. The top row shows a hand-drawn sketch of an icon (left), the recovered program (middle), and the compilation of the recovered program (right). The top two rows are for the constructive solid geometry language (CSG2D-Sketch). The last row is an example output from our TinySVG environment that learns to invert hierarchical programs of shapes and colors. Video examples can be found at https://tree-diffusion.github.io.
  • Figure 2: An overview of our method. Analogously to adding noise in image diffusion, we randomly make small mutations to the syntax trees of programs. We then train a conditional neural model to invert these small mutations. In the above example, we operate in a domain-specific language (DSL) for creating 2D graphics using a constructive solid geometry language. The leftmost panel ($z_0$) shows the target image (bottom) alongside its program as a syntax tree (top). The $y$ value of the circle gets mutated from $16$ to $10$ in the second panel, making the black circle "jump" a little higher. Between $z_1$ and $z_2$, we see that we can mutate the Subtract ($-$) node to a Circle node, effectively deleting it.
  • Figure 3: We train $q_\phi(z_{t - 1} | z_{t}, x_{t}; x_0)$ as a decoder only vision-language transformer following vlm. We use an NF-ResNet as the image encoder, which is a normalizer-free convolutional architecture proposed by nfresnet. The image encoder encodes the current image, $x_t$, and the target images, $x_0$. The current program is tokenized according to the vocabulary in our context-free grammar. The decoder first predicts an edit location in the current program, and then tokens that replace what the edit location should be replaced by. We constrain the autoregressive decoding by our context-free grammar by masking only the valid token logits.
  • Figure 4: Performance of our approach in comparison to baseline methods in CSG2D and TinySVG languages. We give the methods $n = 256$ images from the test set and measure the number of nodes expanded to find a solution. The auto-regressive baseline was queried with rejection sampling. Our policy outperforms previous methods, and our policy combined with search helps boost performance further. Error bars show standard deviation across 5 random seeds.
  • Figure 5: Effects of changing several design decisions of our system. We train smaller models on the Rainbow environment. We give the model $n = 256$ test problems to solve. In (a), for No Reverse Path, we train the model without computing an explicit reverse path, only using the last step of the noising process as targets. For No Current Image, we train a model that does not get to see the compiled output image of the program it is editing. For No Noising, instead of using our noising process, we generate two random expressions and use the path between them as targets. In (b) we examine the effect of training mixture between forward diffusion ($\rho = 0.0$) and pure random initialization ($\rho = 1.0$) further. Error bars show standard deviation across $5$ random seeds.
  • ...and 7 more figures