Table of Contents
Fetching ...

Type-Constrained Code Generation with Language Models

Niels Mündler, Jingxuan He, Hao Wang, Koushik Sen, Dawn Song, Martin Vechev

TL;DR

This work introduces a type-constrained decoding approach that leverages type systems to guide code generation and develops novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.

Abstract

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although constrained decoding is a promising approach to alleviate this issue, it has only been applied to handle either domain-specific languages or syntactic features of general-purpose programming languages. However, LLMs frequently generate code with typing errors, which are beyond the domain of syntax and generally hard to adequately constrain. To address this challenge, we introduce a type-constrained decoding approach that leverages type systems to guide code generation. For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code. We formalize our approach on a foundational simply-typed language and extend it to TypeScript to demonstrate practicality. Our evaluation on the HumanEval and MBPP datasets shows that our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks across LLMs of various sizes and model families, including state-of-the-art open-weight models with more than 30B parameters. The results demonstrate the generality and effectiveness of our approach in constraining LLM code generation with formal rules of type systems.

Type-Constrained Code Generation with Language Models

TL;DR

This work introduces a type-constrained decoding approach that leverages type systems to guide code generation and develops novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.

Abstract

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although constrained decoding is a promising approach to alleviate this issue, it has only been applied to handle either domain-specific languages or syntactic features of general-purpose programming languages. However, LLMs frequently generate code with typing errors, which are beyond the domain of syntax and generally hard to adequately constrain. To address this challenge, we introduce a type-constrained decoding approach that leverages type systems to guide code generation. For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code. We formalize our approach on a foundational simply-typed language and extend it to TypeScript to demonstrate practicality. Our evaluation on the HumanEval and MBPP datasets shows that our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks across LLMs of various sizes and model families, including state-of-the-art open-weight models with more than 30B parameters. The results demonstrate the generality and effectiveness of our approach in constraining LLM code generation with formal rules of type systems.

Paper Structure

This paper contains 62 sections, 6 theorems, 2 equations, 14 figures, 6 tables, 1 algorithm.

Key Result

lemma 1

If $A$ is a prefix automaton, then $L(A{})^p = L_r(A{})$.

Figures (14)

  • Figure 1: The syntax of $L_B$. Expressions are categorized into base and extension expressions. The later extends a given expression with suffix operators to form more complicated expressions.
  • Figure 2: Typing rules for $L_B$'s expressions.
  • Figure 3: Type environment extension rules for sequences of statements in $L_B$.
  • Figure 4: $L_B$'s typing rules for function returns.
  • Figure 5: Histogram on the number of iterations consumed by the sample-and-check loop at \ref{['line:decoding-loop2']}of \ref{['alg:decoding']} to find a valid token, measured with Gemma 2 2B for HumanEval synthesis.
  • ...and 9 more figures

Theorems & Definitions (7)

  • Definition 1
  • lemma 1
  • lemma 2
  • lemma 3
  • corollary 1
  • lemma 4
  • lemma 5