Table of Contents
Fetching ...

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

Wenchao Gu, Juntao Chen, Yanlin Wang, Tianyue Jiang, Xingzhe Li, Mingwei Liu, Xilin Liu, Yuchi Ma, Zibin Zheng

TL;DR

Repository-level code generation suffers from long-context constraints and diverse dependencies. The paper conducts an empirical analysis showing that in-context code and API information markedly improve LLM performance, while retrieved similar code can introduce noise. It introduces AllianceCoder, a three-stage, context-integrated retrieval framework that generates natural language API descriptions, decomposes user queries into implementation steps, and retrieves APIs via semantic descriptions for improved code generation, achieving state-of-the-art results with up to 20% gains in Pass@1 on CoderEval and RepoExec. The study also analyzes API prediction and compares natural-language API descriptions to code snippets for retrieval, finding that semantic API descriptions provide more robust guidance and that text-based retrieval often outperforms code-based retrieval. Overall, targeted, API-centered retrieval combined with chain-of-thought planning significantly enhances repository-level code generation and offers practical improvements for real-world programming tasks.

Abstract

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted, the effectiveness of different retrieved information sources-contextual code, APIs, and similar snippets-has not been rigorously analyzed. Through an empirical study on two benchmarks, we demonstrate that in-context code and potential API information significantly enhance LLM performance, whereas retrieved similar code often introduces noise, degrading results by up to 15%. Based on the preliminary results, we propose AllianceCoder, a novel context-integrated method that employs chain-of-thought prompting to decompose user queries into implementation steps and retrieves APIs via semantic description matching. Through extensive experiments on CoderEval and RepoExec, AllianceCoder achieves state-of-the-art performance, improving Pass@1 by up to 20% over existing approaches.

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond

TL;DR

Repository-level code generation suffers from long-context constraints and diverse dependencies. The paper conducts an empirical analysis showing that in-context code and API information markedly improve LLM performance, while retrieved similar code can introduce noise. It introduces AllianceCoder, a three-stage, context-integrated retrieval framework that generates natural language API descriptions, decomposes user queries into implementation steps, and retrieves APIs via semantic descriptions for improved code generation, achieving state-of-the-art results with up to 20% gains in Pass@1 on CoderEval and RepoExec. The study also analyzes API prediction and compares natural-language API descriptions to code snippets for retrieval, finding that semantic API descriptions provide more robust guidance and that text-based retrieval often outperforms code-based retrieval. Overall, targeted, API-centered retrieval combined with chain-of-thought planning significantly enhances repository-level code generation and offers practical improvements for real-world programming tasks.

Abstract

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted, the effectiveness of different retrieved information sources-contextual code, APIs, and similar snippets-has not been rigorously analyzed. Through an empirical study on two benchmarks, we demonstrate that in-context code and potential API information significantly enhance LLM performance, whereas retrieved similar code often introduces noise, degrading results by up to 15%. Based on the preliminary results, we propose AllianceCoder, a novel context-integrated method that employs chain-of-thought prompting to decompose user queries into implementation steps and retrieves APIs via semantic description matching. Through extensive experiments on CoderEval and RepoExec, AllianceCoder achieves state-of-the-art performance, improving Pass@1 by up to 20% over existing approaches.

Paper Structure

This paper contains 38 sections, 6 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Intersection of correct answers across ConAPI, API, and Context under various LLMs and datasets.
  • Figure 2: Comparison of Input Prompt Lengths for Test Cases: Success in Both ConAPI & API vs. API-Only Success Across Different Datasets and LLM.
  • Figure 3: AllianceCoder framework.