A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

Wenrui Zhang; Tiehang Fu; Ting Yuan; Ge Zhang; Dong Chen; Jie Wang

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

Wenrui Zhang, Tiehang Fu, Ting Yuan, Ge Zhang, Dong Chen, Jie Wang

TL;DR

This paper tackles inefficiencies in retrieval-augmented code completion by introducing CARD, a lightweight, uncertainty-based critique plug-in that decides when to retrieve and how to select the best among multiple predictions. The estimator, trained on a 13-feature representation derived from LM outputs using LightGBM, predicts a quality score $\hat{s}$ used by two functions: $\text{isRetrieve}$ and $\text{Select}$, to adapt retrieval and acceptances across iterative RAG workflows. Extensive experiments on RepoEval and RepoEval-M across multiple languages and code LMs show that CARD reduces retrieval overhead by up to 46.5% and latency by 16%–83% while improving ES and, in several settings, EM and UT. The results demonstrate CARD’s generalizability across retrievers, generators, languages, and even unseen LMs, supporting its deployment as a universal, low-cost augmentation to RAG-based code completion systems.

Abstract

Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different design choices. For instance, gaining more effectiveness at the cost of repeating the retrieval-generation process multiple times. However, the indiscriminate use of retrieval in current methods reveals issues in both efficiency and effectiveness, as a considerable portion of retrievals are unnecessary and may introduce unhelpful or even harmful suggestions to code language models. To address these challenges, we introduce CARD, a lightweight critique method designed to provide insights into the necessity of retrievals and select the optimal answer from multiple predictions. CARD can seamlessly integrate into any RAG-based code completion system. Our evaluation shows that CARD saves 21% to 46% times of retrieval for Line completion, 14% to 40% times of retrieval for API completion, and 6% to 46.5% times of retrieval for function completion respectively, while improving the accuracy. CARD reduces latency ranging from 16% to 83%. CARD is generalizable to different LMs, retrievers, and programming languages. It is lightweight with training in few seconds and inference in few milliseconds.

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

TL;DR

used by two functions:

and

, to adapt retrieval and acceptances across iterative RAG workflows. Extensive experiments on RepoEval and RepoEval-M across multiple languages and code LMs show that CARD reduces retrieval overhead by up to 46.5% and latency by 16%–83% while improving ES and, in several settings, EM and UT. The results demonstrate CARD’s generalizability across retrievers, generators, languages, and even unseen LMs, supporting its deployment as a universal, low-cost augmentation to RAG-based code completion systems.

Abstract

Paper Structure (29 sections, 6 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 6 equations, 9 figures, 8 tables, 1 algorithm.

Introduction
Methodology
Uncertainty estimation
Adaptive retrieval
Selective accept
Applications in RAG-based code completion.
Experimental Setup
Evaluation Datasets
Evaluation Metrics
Target RAG-based code completion system
CARD
Results
Is CARD beneficial to RAG-based code completion task?
Performance
Latency
...and 14 more sections

Figures (9)

Figure 1: The distribution of ES (top) and improvement of ES (bottom) on the line dataset of RepoEval zhang2023repocoder considering zero-shot setup (left) and iterative RAG (right) respectively. RG$i$ stands for the $i$-th Retrieval-Generation.
Figure 2: CARD: Suppose that a RAG-based code completion system uses the function $R$ for retrieval and uses the function $G$ for generation. Whenever a code prediction is generated with or without retrieved information, the function $isRetrieve$ can be queried to determine whether retrieval is necessary. When multiple predictions are generated from different iterations, the function $select$ can be queried to determine which prediction is the best. These two functions introduce little overhead and can be used independently.
Figure 3: Improvement of RAG on two datasets (Left: line, Right: API) of RepoEval zhang2023repocoder with CodeLlama 7b as code LM. The x-axis represents the ES of zero-shot generation (top) and RG1 generation (bottom), and the y-axis represents the improvement of ES via RAG.
Figure 4: Performance under different retrievers when the code LM is DeepSeek-Coder-7B.
Figure 5: The ground truth ES versus the predicted ES. Each bar represents the average ES of the predicted values within the range on the x-axis.
...and 4 more figures

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

TL;DR

Abstract

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

Authors

TL;DR

Abstract

Table of Contents

Figures (9)