Large Language Model-Aware In-Context Learning for Code Generation

Jia Li; Ge Li; Chongyang Tao; Jia Li; Huangzhao Zhang; Fang Liu; Zhi Jin

Large Language Model-Aware In-Context Learning for Code Generation

Jia Li, Ge Li, Chongyang Tao, Jia Li, Huangzhao Zhang, Fang Liu, Zhi Jin

TL;DR

This work tackles the instability of in-context learning for code generation by introducing LAIL, a learning-based, LLM-aware method that uses LLM-generated ground-truth probabilities to label candidate in-context examples and trains a neural retriever with contrastive learning to select prompts. By aligning the retriever with actual LLM preferences and leveraging BM25 to seed candidates, LAIL achieves state-of-the-art Pass@1 across CodeGen and GPT-3.5 on MBPP, MBJP, and MBCPP, with evidence of human-preferred correctness, quality, and maintainability. The results demonstrate strong transferability of the learned retrieval policy across LLMs and datasets and highlight the importance of the number of in-context examples and the design of probability-based feedback signals. Overall, LAIL offers a practical, scalable approach to improve code-generation via more effective prompt construction and demonstrates broad applicability across languages and model families.

Abstract

Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. LLMs take a prompt consisting of requirement-code examples and a new requirement as input, and output new programs. Existing studies have found that ICL is highly dominated by the examples and thus arises research on example selection. However, existing approaches randomly select examples or only consider the textual similarity of requirements to retrieve, leading to sub-optimal performance. In this paper, we propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation. Given a candidate example, we exploit LLMs themselves to estimate it by considering the generation probabilities of ground-truth programs given a requirement and the example. We then label candidate examples as positive or negative through the probability feedback. Based on the labeled data, we import a contrastive learning objective to train an effective retriever that acquires the preference of LLMs in code generation. We apply LAIL to three LLMs and evaluate it on three representative datasets (e.g., MBJP, MBPP, and MBCPP). LATA outperforms the state-of-the-art baselines by 11.58%, 6.89%, and 5.07% on CodeGen, and 4.38%, 2.85%, and 2.74% on GPT-3.5 in terms of Pass@1, respectively.

Large Language Model-Aware In-Context Learning for Code Generation

TL;DR

Abstract

Paper Structure (27 sections, 6 equations, 4 figures, 7 tables)

This paper contains 27 sections, 6 equations, 4 figures, 7 tables.

introduction
Background
Large Language Models
In-Context Learning
Method: LAIL
Estimate and Label Examples
Training Neural Retriever
Inference
Study design
Research Questions
Datasets
Evaluation Metrics
Baselines
Base Large Language Models
Implementation Details
...and 12 more sections

Figures (4)

Figure 1: Exhibition of the selected top-3 examples by random, BM25, and our LAIL approaches.
Figure 2: The overview of our LAIL. LAIL use LLMs themselves to estimate candidate examples and label them as positive and negative (A). Based on the label date, LAIL then trains a retriever to align with the preference of LLMs with a contrastive loss (B). Given a test requirement, the optimized retriever selects several examples as a prompt that is inputted to LLMs for code generation (C).
Figure 3: Results of transferring the retriever trained on one dataset (row) to others (column) on GPT-3.5 in MBPP dataset.
Figure 4: The performance of the different number of in-context examples on GPT-3.5 in MBPP datasets.

Large Language Model-Aware In-Context Learning for Code Generation

TL;DR

Abstract

Large Language Model-Aware In-Context Learning for Code Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)