Table of Contents
Fetching ...

ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using LLMs

Azmain Kabir, Shaowei Wang, Yuan Tian, Tse-Hsun Chen, Muhammad Asaduzzaman, Wenbin Zhang

TL;DR

The paper tackles the challenge of incomplete and uncompilable code snippets from Q&A sites by introducing ZS4C, a zero-shot, two-stage LLM-based approach that first infers missing import statements and then iteratively fixes errors with a validator. It demonstrates that coupling an LLM with a compiler through structured prompts and self-consistency significantly boosts compilation rates and import accuracy on Java (StatType-SO) and Python (Python-SO) datasets, outperforming the state-of-the-art SnR and a BasePrompt baseline, especially with GPT-4. The work includes extensive ablations, parameter analyses, and a discussion of limitations, cost, and failure modes, providing practical guidance for adopting LLM-assisted code synthesis in realistic settings. A public replication package and dataset are released to enable further research on partial code synthesis and code repair through LLMs.

Abstract

Technical Q&A sites are valuable for software developers seeking knowledge, but the code snippets they provide are often uncompilable and incomplete due to unresolved types and missing libraries. This poses a challenge for users who wish to reuse or analyze these snippets. Existing methods either do not focus on creating compilable code or have low success rates. To address this, we propose ZS4C, a lightweight approach for zero-shot synthesis of compilable code from incomplete snippets using Large Language Models (LLMs). ZS4C operates in two stages: first, it uses an LLM, like GPT-3.5, to identify missing import statements in a snippet; second, it collaborates with a validator (e.g., compiler) to fix compilation errors caused by incorrect imports and syntax issues. We evaluated ZS4C on the StatType-SO benchmark and a new dataset, Python-SO, which includes 539 Python snippets from Stack Overflow across the 20 most popular Python libraries. ZS4C significantly outperforms existing methods, improving the compilation rate from 63% to 95.1% compared to the state-of-the-art SnR, marking a 50.1% improvement. On average, ZS4C can infer more accurate import statements (with an F1 score of 0.98) than SnR, with an improvement of 8.5% in the F1.

ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using LLMs

TL;DR

The paper tackles the challenge of incomplete and uncompilable code snippets from Q&A sites by introducing ZS4C, a zero-shot, two-stage LLM-based approach that first infers missing import statements and then iteratively fixes errors with a validator. It demonstrates that coupling an LLM with a compiler through structured prompts and self-consistency significantly boosts compilation rates and import accuracy on Java (StatType-SO) and Python (Python-SO) datasets, outperforming the state-of-the-art SnR and a BasePrompt baseline, especially with GPT-4. The work includes extensive ablations, parameter analyses, and a discussion of limitations, cost, and failure modes, providing practical guidance for adopting LLM-assisted code synthesis in realistic settings. A public replication package and dataset are released to enable further research on partial code synthesis and code repair through LLMs.

Abstract

Technical Q&A sites are valuable for software developers seeking knowledge, but the code snippets they provide are often uncompilable and incomplete due to unresolved types and missing libraries. This poses a challenge for users who wish to reuse or analyze these snippets. Existing methods either do not focus on creating compilable code or have low success rates. To address this, we propose ZS4C, a lightweight approach for zero-shot synthesis of compilable code from incomplete snippets using Large Language Models (LLMs). ZS4C operates in two stages: first, it uses an LLM, like GPT-3.5, to identify missing import statements in a snippet; second, it collaborates with a validator (e.g., compiler) to fix compilation errors caused by incorrect imports and syntax issues. We evaluated ZS4C on the StatType-SO benchmark and a new dataset, Python-SO, which includes 539 Python snippets from Stack Overflow across the 20 most popular Python libraries. ZS4C significantly outperforms existing methods, improving the compilation rate from 63% to 95.1% compared to the state-of-the-art SnR, marking a 50.1% improvement. On average, ZS4C can infer more accurate import statements (with an F1 score of 0.98) than SnR, with an improvement of 8.5% in the F1.
Paper Structure (35 sections, 8 figures, 10 tables, 1 algorithm)

This paper contains 35 sections, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of our approach ZS4C.
  • Figure 2: An example of code snippet that is fixed by ConversationFixing. The left example is before and the right example is after applying the ConversationFixing.
  • Figure 3: The impact of parameters $K$ (left) and $M$ (right) on ZS4C$_{GPT-3.5}$ for StatType-SO.
  • Figure 4: An example of eventually failed after 15 rounds of conversations.
  • Figure 5: An example of "Symbol Not Found" error fixes using ZS4C for StatType-SO.
  • ...and 3 more figures