ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using LLMs
Azmain Kabir, Shaowei Wang, Yuan Tian, Tse-Hsun Chen, Muhammad Asaduzzaman, Wenbin Zhang
TL;DR
The paper tackles the challenge of incomplete and uncompilable code snippets from Q&A sites by introducing ZS4C, a zero-shot, two-stage LLM-based approach that first infers missing import statements and then iteratively fixes errors with a validator. It demonstrates that coupling an LLM with a compiler through structured prompts and self-consistency significantly boosts compilation rates and import accuracy on Java (StatType-SO) and Python (Python-SO) datasets, outperforming the state-of-the-art SnR and a BasePrompt baseline, especially with GPT-4. The work includes extensive ablations, parameter analyses, and a discussion of limitations, cost, and failure modes, providing practical guidance for adopting LLM-assisted code synthesis in realistic settings. A public replication package and dataset are released to enable further research on partial code synthesis and code repair through LLMs.
Abstract
Technical Q&A sites are valuable for software developers seeking knowledge, but the code snippets they provide are often uncompilable and incomplete due to unresolved types and missing libraries. This poses a challenge for users who wish to reuse or analyze these snippets. Existing methods either do not focus on creating compilable code or have low success rates. To address this, we propose ZS4C, a lightweight approach for zero-shot synthesis of compilable code from incomplete snippets using Large Language Models (LLMs). ZS4C operates in two stages: first, it uses an LLM, like GPT-3.5, to identify missing import statements in a snippet; second, it collaborates with a validator (e.g., compiler) to fix compilation errors caused by incorrect imports and syntax issues. We evaluated ZS4C on the StatType-SO benchmark and a new dataset, Python-SO, which includes 539 Python snippets from Stack Overflow across the 20 most popular Python libraries. ZS4C significantly outperforms existing methods, improving the compilation rate from 63% to 95.1% compared to the state-of-the-art SnR, marking a 50.1% improvement. On average, ZS4C can infer more accurate import statements (with an F1 score of 0.98) than SnR, with an improvement of 8.5% in the F1.
