Table of Contents
Fetching ...

Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis

Zhechong Huang, Zhao Zhang, Ruyi Ji, Tingxuan Xia, Qihao Zhu, Qinxiang Cao, Zeyu Sun, Yingfei Xiong

TL;DR

TyFlow tackles the persistent problem of type errors in LM-generated code by internalizing type reasoning within the generation process. It builds an isomorphic synthesis system where each typing rule corresponds to a synthesis rule, enabling programs to be constructed through synthesis decision sequences that inherently encode type derivations. Grounded in constrained Horn clauses and first-order unification, TyFlow guarantees that generated programs are well-typed, while its LM-driven query resolution learns to apply type rules and generate ground variable assignments. Empirically, TyFlow substantially reduces compilation errors and improves functional correctness across SuFu and Java benchmarks, outperforming rejection-based and separated type-code generation baselines. The approach not only yields practical gains in code quality but also provides a general framework for enforcing complex, CHC-representable constraints during language-model–driven synthesis.

Abstract

Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.

Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis

TL;DR

TyFlow tackles the persistent problem of type errors in LM-generated code by internalizing type reasoning within the generation process. It builds an isomorphic synthesis system where each typing rule corresponds to a synthesis rule, enabling programs to be constructed through synthesis decision sequences that inherently encode type derivations. Grounded in constrained Horn clauses and first-order unification, TyFlow guarantees that generated programs are well-typed, while its LM-driven query resolution learns to apply type rules and generate ground variable assignments. Empirically, TyFlow substantially reduces compilation errors and improves functional correctness across SuFu and Java benchmarks, outperforming rejection-based and separated type-code generation baselines. The approach not only yields practical gains in code quality but also provides a general framework for enforcing complex, CHC-representable constraints during language-model–driven synthesis.

Abstract

Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.

Paper Structure

This paper contains 58 sections, 12 theorems, 12 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

lemma 1

There is a bijection $\varphi$ between typing rules and synthesis rules.

Figures (11)

  • Figure 1: A code generation task and the incorrect Java programs generated by CodeT5, where the prompt comprises the function signature, input-output examples, and the description "return the first odd element from a list (-1 by default)". We highlight the incorrect code as red and the expected counterpart as green.
  • Figure 2: The isomorphism between a type system and a synthesis system
  • Figure 3: The definition of $\lambda^{\rightarrow}$ in TyFlow, comprising its syntax and some selected typing rules.
  • Figure 4: A training task over $\lambda^{\rightarrow}$.
  • Figure 5: The type derivation tree of the expected program $p^*$ in \ref{['figure:sample-task']}.
  • ...and 6 more figures

Theorems & Definitions (12)

  • lemma 1
  • lemma 2
  • lemma 3
  • lemma 4
  • lemma 5
  • theorem 1: Soundness
  • theorem 2: Completeness
  • corollary 1
  • lemma 6
  • lemma 7
  • ...and 2 more