Table of Contents
Fetching ...

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

Wenrui Zhang, Tiehang Fu, Ting Yuan, Ge Zhang, Dong Chen, Jie Wang

TL;DR

This paper tackles inefficiencies in retrieval-augmented code completion by introducing CARD, a lightweight, uncertainty-based critique plug-in that decides when to retrieve and how to select the best among multiple predictions. The estimator, trained on a 13-feature representation derived from LM outputs using LightGBM, predicts a quality score $\hat{s}$ used by two functions: $\text{isRetrieve}$ and $\text{Select}$, to adapt retrieval and acceptances across iterative RAG workflows. Extensive experiments on RepoEval and RepoEval-M across multiple languages and code LMs show that CARD reduces retrieval overhead by up to 46.5% and latency by 16%–83% while improving ES and, in several settings, EM and UT. The results demonstrate CARD’s generalizability across retrievers, generators, languages, and even unseen LMs, supporting its deployment as a universal, low-cost augmentation to RAG-based code completion systems.

Abstract

Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different design choices. For instance, gaining more effectiveness at the cost of repeating the retrieval-generation process multiple times. However, the indiscriminate use of retrieval in current methods reveals issues in both efficiency and effectiveness, as a considerable portion of retrievals are unnecessary and may introduce unhelpful or even harmful suggestions to code language models. To address these challenges, we introduce CARD, a lightweight critique method designed to provide insights into the necessity of retrievals and select the optimal answer from multiple predictions. CARD can seamlessly integrate into any RAG-based code completion system. Our evaluation shows that CARD saves 21% to 46% times of retrieval for Line completion, 14% to 40% times of retrieval for API completion, and 6% to 46.5% times of retrieval for function completion respectively, while improving the accuracy. CARD reduces latency ranging from 16% to 83%. CARD is generalizable to different LMs, retrievers, and programming languages. It is lightweight with training in few seconds and inference in few milliseconds.

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

TL;DR

This paper tackles inefficiencies in retrieval-augmented code completion by introducing CARD, a lightweight, uncertainty-based critique plug-in that decides when to retrieve and how to select the best among multiple predictions. The estimator, trained on a 13-feature representation derived from LM outputs using LightGBM, predicts a quality score used by two functions: and , to adapt retrieval and acceptances across iterative RAG workflows. Extensive experiments on RepoEval and RepoEval-M across multiple languages and code LMs show that CARD reduces retrieval overhead by up to 46.5% and latency by 16%–83% while improving ES and, in several settings, EM and UT. The results demonstrate CARD’s generalizability across retrievers, generators, languages, and even unseen LMs, supporting its deployment as a universal, low-cost augmentation to RAG-based code completion systems.

Abstract

Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different design choices. For instance, gaining more effectiveness at the cost of repeating the retrieval-generation process multiple times. However, the indiscriminate use of retrieval in current methods reveals issues in both efficiency and effectiveness, as a considerable portion of retrievals are unnecessary and may introduce unhelpful or even harmful suggestions to code language models. To address these challenges, we introduce CARD, a lightweight critique method designed to provide insights into the necessity of retrievals and select the optimal answer from multiple predictions. CARD can seamlessly integrate into any RAG-based code completion system. Our evaluation shows that CARD saves 21% to 46% times of retrieval for Line completion, 14% to 40% times of retrieval for API completion, and 6% to 46.5% times of retrieval for function completion respectively, while improving the accuracy. CARD reduces latency ranging from 16% to 83%. CARD is generalizable to different LMs, retrievers, and programming languages. It is lightweight with training in few seconds and inference in few milliseconds.
Paper Structure (29 sections, 6 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 6 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: The distribution of ES (top) and improvement of ES (bottom) on the line dataset of RepoEval zhang2023repocoder considering zero-shot setup (left) and iterative RAG (right) respectively. RG$i$ stands for the $i$-th Retrieval-Generation.
  • Figure 2: CARD: Suppose that a RAG-based code completion system uses the function $R$ for retrieval and uses the function $G$ for generation. Whenever a code prediction is generated with or without retrieved information, the function $isRetrieve$ can be queried to determine whether retrieval is necessary. When multiple predictions are generated from different iterations, the function $select$ can be queried to determine which prediction is the best. These two functions introduce little overhead and can be used independently.
  • Figure 3: Improvement of RAG on two datasets (Left: line, Right: API) of RepoEval zhang2023repocoder with CodeLlama 7b as code LM. The x-axis represents the ES of zero-shot generation (top) and RG1 generation (bottom), and the y-axis represents the improvement of ES via RAG.
  • Figure 4: Performance under different retrievers when the code LM is DeepSeek-Coder-7B.
  • Figure 5: The ground truth ES versus the predicted ES. Each bar represents the average ES of the predicted values within the range on the x-axis.
  • ...and 4 more figures