Private-Library-Oriented Code Generation with Large Language Models
Daoguang Zan, Bei Chen, Yongshun Gong, Junzhi Cao, Fengji Zhang, Bingchao Wu, Bei Guan, Yilong Yin, Yongji Wang
TL;DR
This work tackles private-library oriented code generation by introducing a retrieval-augmented framework with two modules: APIFinder, which retrieves relevant API documentation APIs using a dense dual-encoder retriever, and APICoder, which generates code that invokes these APIs. The authors further improve generator capabilities through CodeGenAPI, a continually pre-trained variant that ingests API information prior to code blocks. To evaluate the approach, they create four private-library benchmarks TorchDataEval, TorchDataComplexEval, MonkeyEval, BeatNumEval, and demonstrate that API basics and examples are most beneficial for prompting, with CodeGenAPI providing consistent gains over off-the-shelf baselines. The results show that private-library code generation is feasible with retrieval-augmented prompting, though there remain challenges such as noise handling, error types, and scalability to large API sets, highlighting directions for future tooling and privacy-aware development.
Abstract
Large language models (LLMs), such as Codex and GPT-4, have recently showcased their remarkable code generation abilities, facilitating a significant boost in coding efficiency. This paper will delve into utilizing LLMs for code generation in private libraries, as they are widely employed in everyday programming. Despite their remarkable capabilities, generating such private APIs poses a formidable conundrum for LLMs, as they inherently lack exposure to these private libraries during pre-training. To address this challenge, we propose a novel framework that emulates the process of programmers writing private code. This framework comprises two modules: APIFinder first retrieves potentially useful APIs from API documentation; and APICoder then leverages these retrieved APIs to generate private code. Specifically, APIFinder employs vector retrieval techniques and allows user involvement in the retrieval process. For APICoder, it can directly utilize off-the-shelf code generation models. To further cultivate explicit proficiency in invoking APIs from prompts, we continuously pre-train a reinforced version of APICoder, named CodeGenAPI. Our goal is to train the above two modules on vast public libraries, enabling generalization to private ones. Meanwhile, we create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval, and meticulously handcraft test cases for each benchmark to support comprehensive evaluations. Numerous experiments on the four benchmarks consistently affirm the effectiveness of our approach. Furthermore, deeper analysis is also conducted to glean additional insights.
