Table of Contents
Fetching ...

GramTrans: A Better Code Representation Approach in Code Generation

Zhao Zhang, Qingyuan Liang, Zeyu Sun, Yizhou Chen, Guoqing Wang, Yican Sun, Lu Zhang, Ge Li, Yingfei Xiong

TL;DR

The paper addresses how the representation of code impacts neural code generation, proposing that easier-to-parse representations yield better performance. It formalizes parsing difficulty via grammar classes and validates this conjecture with controlled experiments on a Python-based DSL, showing a strong correlation between parsing simplicity and model accuracy. Building on this insight, it introduces GramTrans, an automatic LL(1) grammar transformation framework plus a bidirectional translator, enabling any CFG to be recoded into an LL(1) representation and used across multiple languages and models. Empirically, GramTrans improves code generation performance on Python and Java across several benchmarks and models, while keeping input length manageable, and a detailed analysis confirms the central role of parsing difficulty in explaining performance gains. The work provides practical guidance for representation design in code generation and offers a scalable, language-agnostic tool for improving model effectiveness.

Abstract

Code generation has shown great promise in assisting software development. A fundamental yet underexplored question is how the choice of code representation affects model performance. While existing studies employ various representations, such as treating code as plain text, grammar rule sequences, or syntax tree sequences, they lack a principled understanding of the relationship between parsing difficulty and model effectiveness. This paper proposes a conjecture: the easier a representation is to parse, the better performance the model achieves. We formalize this idea using grammar classes, where representations in simpler classes (e.g., LL(1)) are easier to parse. Through a controlled experiment on a Python-based DSL, we show that parsing difficulty strongly correlates with model performance. Motivated by this finding, we present GramTrans, a general approach that automatically transforms a context-free language into a representation within the LL(1) class. GramTrans introduces a novel hierarchical conflict elimination algorithm, enabling a flexible trade-off between syntactic simplicity and token efficiency. We evaluate GramTrans on both Python and Java using three code generation models: StarCoder 1B, DeepSeek-Coder 1.3B, and Qwen2.5 1.5B. Across multiple benchmarks, GramTrans consistently delivers significant improvements over baseline representations. Furthermore, our analysis of existing representations reconfirms the strong alignment between parsing difficulty and model performance, providing additional support for the conjecture.

GramTrans: A Better Code Representation Approach in Code Generation

TL;DR

The paper addresses how the representation of code impacts neural code generation, proposing that easier-to-parse representations yield better performance. It formalizes parsing difficulty via grammar classes and validates this conjecture with controlled experiments on a Python-based DSL, showing a strong correlation between parsing simplicity and model accuracy. Building on this insight, it introduces GramTrans, an automatic LL(1) grammar transformation framework plus a bidirectional translator, enabling any CFG to be recoded into an LL(1) representation and used across multiple languages and models. Empirically, GramTrans improves code generation performance on Python and Java across several benchmarks and models, while keeping input length manageable, and a detailed analysis confirms the central role of parsing difficulty in explaining performance gains. The work provides practical guidance for representation design in code generation and offers a scalable, language-agnostic tool for improving model effectiveness.

Abstract

Code generation has shown great promise in assisting software development. A fundamental yet underexplored question is how the choice of code representation affects model performance. While existing studies employ various representations, such as treating code as plain text, grammar rule sequences, or syntax tree sequences, they lack a principled understanding of the relationship between parsing difficulty and model effectiveness. This paper proposes a conjecture: the easier a representation is to parse, the better performance the model achieves. We formalize this idea using grammar classes, where representations in simpler classes (e.g., LL(1)) are easier to parse. Through a controlled experiment on a Python-based DSL, we show that parsing difficulty strongly correlates with model performance. Motivated by this finding, we present GramTrans, a general approach that automatically transforms a context-free language into a representation within the LL(1) class. GramTrans introduces a novel hierarchical conflict elimination algorithm, enabling a flexible trade-off between syntactic simplicity and token efficiency. We evaluate GramTrans on both Python and Java using three code generation models: StarCoder 1B, DeepSeek-Coder 1.3B, and Qwen2.5 1.5B. Across multiple benchmarks, GramTrans consistently delivers significant improvements over baseline representations. Furthermore, our analysis of existing representations reconfirms the strong alignment between parsing difficulty and model performance, providing additional support for the conjecture.

Paper Structure

This paper contains 46 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Different representations of the program example. The program and its syntax tree are shown on the left, and different representations are illustrated in the middle. The grammar-rule-based representation is derived from the traversal of the dashed boxes (grammar rules), and the syntax-tree-based representation is obtained by traversing the syntax tree. These representations can themselves be viewed as new languages, whose grammars are given on the right.
  • Figure 2: The hierarchical structure of grammars
  • Figure 3: Overview of validation on a DSL.
  • Figure 4: An example of the MathQA dataset (<x> is new terminal)
  • Figure 5: Grammar hierarchy of DSLs
  • ...and 3 more figures