A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model
Wenrui Zhang, Tiehang Fu, Ting Yuan, Ge Zhang, Dong Chen, Jie Wang
TL;DR
This paper tackles inefficiencies in retrieval-augmented code completion by introducing CARD, a lightweight, uncertainty-based critique plug-in that decides when to retrieve and how to select the best among multiple predictions. The estimator, trained on a 13-feature representation derived from LM outputs using LightGBM, predicts a quality score $\hat{s}$ used by two functions: $\text{isRetrieve}$ and $\text{Select}$, to adapt retrieval and acceptances across iterative RAG workflows. Extensive experiments on RepoEval and RepoEval-M across multiple languages and code LMs show that CARD reduces retrieval overhead by up to 46.5% and latency by 16%–83% while improving ES and, in several settings, EM and UT. The results demonstrate CARD’s generalizability across retrievers, generators, languages, and even unseen LMs, supporting its deployment as a universal, low-cost augmentation to RAG-based code completion systems.
Abstract
Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different design choices. For instance, gaining more effectiveness at the cost of repeating the retrieval-generation process multiple times. However, the indiscriminate use of retrieval in current methods reveals issues in both efficiency and effectiveness, as a considerable portion of retrievals are unnecessary and may introduce unhelpful or even harmful suggestions to code language models. To address these challenges, we introduce CARD, a lightweight critique method designed to provide insights into the necessity of retrievals and select the optimal answer from multiple predictions. CARD can seamlessly integrate into any RAG-based code completion system. Our evaluation shows that CARD saves 21% to 46% times of retrieval for Line completion, 14% to 40% times of retrieval for API completion, and 6% to 46.5% times of retrieval for function completion respectively, while improving the accuracy. CARD reduces latency ranging from 16% to 83%. CARD is generalizable to different LMs, retrievers, and programming languages. It is lightweight with training in few seconds and inference in few milliseconds.
