Table of Contents
Fetching ...

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

Ken Deng, Jiaheng Liu, He Zhu, Congnan Liu, Jingxin Li, Jiakai Wang, Peng Zhao, Chenchen Zhang, Yanan Wu, Xueqiao Yin, Yuanxing Zhang, Zizheng Zhan, Wenbo Su, Bangyu Xiang, Tiezheng Ge, Bo Zheng

TL;DR

R2C2-Coder addresses the gap in repository-level code completion by introducing a retrieval-augmented prompt construction framework (R2C2-Enhance) that uses abstract and snippet contexts via Tree-sitter, and a challenging benchmark (R2C2-Bench) with context perturbations across four languages. The approach builds a retrieval pool, forms prompts with retrieved contexts while respecting a $N=4096$ token limit, and demonstrates substantial improvements over in-file baselines, with further gains from fine-tuning on the benchmark's training split. The work shows that explicit cross-file context, especially abstract context, and robust retrieval strategies significantly boost real-world code completion, and provides a scalable, multi-language benchmark for future research. Overall, R2C2-Coder offers practical methods for enhancing Code LLMs in repository-rich environments and a comprehensive dataset to rigorously evaluate such capabilities.

Abstract

Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies. Besides, the existing benchmarks usually focus on limited code completion scenarios, which cannot reflect the repository-level code completion abilities well of existing methods. To address these limitations, we propose the R2C2-Coder to enhance and benchmark the real-world repository-level code completion abilities of code Large Language Models, where the R2C2-Coder includes a code prompt construction method R2C2-Enhance and a well-designed benchmark R2C2-Bench. Specifically, first, in R2C2-Enhance, we first construct the candidate retrieval pool and then assemble the completion prompt by retrieving from the retrieval pool for each completion cursor position. Second, based on R2C2 -Enhance, we can construct a more challenging and diverse R2C2-Bench with training, validation and test splits, where a context perturbation strategy is proposed to simulate the real-world repository-level code completion well. Extensive results on multiple benchmarks demonstrate the effectiveness of our R2C2-Coder.

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

TL;DR

R2C2-Coder addresses the gap in repository-level code completion by introducing a retrieval-augmented prompt construction framework (R2C2-Enhance) that uses abstract and snippet contexts via Tree-sitter, and a challenging benchmark (R2C2-Bench) with context perturbations across four languages. The approach builds a retrieval pool, forms prompts with retrieved contexts while respecting a token limit, and demonstrates substantial improvements over in-file baselines, with further gains from fine-tuning on the benchmark's training split. The work shows that explicit cross-file context, especially abstract context, and robust retrieval strategies significantly boost real-world code completion, and provides a scalable, multi-language benchmark for future research. Overall, R2C2-Coder offers practical methods for enhancing Code LLMs in repository-rich environments and a comprehensive dataset to rigorously evaluate such capabilities.

Abstract

Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies. Besides, the existing benchmarks usually focus on limited code completion scenarios, which cannot reflect the repository-level code completion abilities well of existing methods. To address these limitations, we propose the R2C2-Coder to enhance and benchmark the real-world repository-level code completion abilities of code Large Language Models, where the R2C2-Coder includes a code prompt construction method R2C2-Enhance and a well-designed benchmark R2C2-Bench. Specifically, first, in R2C2-Enhance, we first construct the candidate retrieval pool and then assemble the completion prompt by retrieving from the retrieval pool for each completion cursor position. Second, based on R2C2 -Enhance, we can construct a more challenging and diverse R2C2-Bench with training, validation and test splits, where a context perturbation strategy is proposed to simulate the real-world repository-level code completion well. Extensive results on multiple benchmarks demonstrate the effectiveness of our R2C2-Coder.
Paper Structure (26 sections, 12 figures, 10 tables)

This paper contains 26 sections, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Examples of code snippets.
  • Figure 2: Overview of our R$^2$C$^2$-Enhance. For the current completion cursor position, we first generate the retrieval query using the prefix and suffix contexts. Then, we perform context retrieval between the retrieval query and the pre-constructed candidate retrieval pool to produce the retrieved contexts. After that, we use the in-file context of the current code and the retrieved contexts to assemble the completion prompt, which is then sent to LLMs to generate the completion response.
  • Figure 3: An example of the abstract syntax tree generated by the Tree-sitter tool.
  • Figure 4: Performance of StarCoder-7B + R$^2$C$^2$-Enhanced Tuning at various perturbation rates on the validation set of R$^2$C$^2$-Bench.
  • Figure 5: (a) Statistics of references from 1 to 5 lines in R$^2$C$^2$-Bench and CrossCodeEval+. (b) Exact Match of StarCoder-7B w/ R$^2$C$^2$-Enhanced tuning on R$^2$C$^2$-Bench and CrossCodeEval+ when expected output varies from 1 to 5 lines. (c) Edit Similarity of StarCoder-7B w/ R$^2$C$^2$-Enhanced tuning on R$^2$C$^2$-Bench and CrossCodeEval+ when expected output varies from 1 to 5 lines.
  • ...and 7 more figures