Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Wei Cheng; Yuhan Wu; Wei Hu

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Wei Cheng, Yuhan Wu, Wei Hu

TL;DR

DraCo addresses repository-level code completion by incorporating an extended dataflow analysis to build a repo-specific context graph that guides precise retrieval of background knowledge. By extracting fine-grained import information and leveraging type-sensitive data dependencies, it generates well-formed prompts that enable code LMs to produce correct, well-formed completions in private repositories. The approach yields measurable gains in code exact match and identifier accuracy across diverse models and datasets (e.g., EM up by 3.43% and ID.EM up by 3.62%), while maintaining real-time efficiency suitable for IDE use. The combination of dataflow-guided retrieval and structured prompt generation offers practical improvements for private-code completion and demonstrates strong cross-LM applicability, with potential extensions to more languages and static-analysis frameworks.

Abstract

Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we propose a dataflow-guided retrieval augmentation approach, called DraCo, for repository-level code completion. DraCo parses a private repository into code entities and establishes their relations through an extended dataflow analysis, forming a repo-specific context graph. Whenever triggering code completion, DraCo precisely retrieves relevant background knowledge from the repo-specific context graph and generates well-formed prompts to query code LMs. Furthermore, we construct a large Python dataset, ReccEval, with more diverse completion targets. Our experiments demonstrate the superior accuracy and applicable efficiency of DraCo, improving code exact match by 3.43% and identifier F1-score by 3.27% on average compared to the state-of-the-art approach.

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

TL;DR

Abstract

Paper Structure (36 sections, 7 figures, 21 tables, 2 algorithms)

This paper contains 36 sections, 7 figures, 21 tables, 2 algorithms.

Introduction
Related Work
Code completion.
Retrieval-augmented generation.
Methodology
Dataflow Analysis
Repo-specific Context Graph
Dataflow-Guided Retrieval
Prompt Generation
Experiment Setup
Datasets
Implementation Details
Evaluation Metrics
Experimental Results and Analysis
Performance Comparison
...and 21 more sections

Figures (7)

Figure 1: A real-world example of repository-level code completion. The code LM CodeGen25-7B-mono fails to complete the last code line correctly when entering only the unfinished code (Zero-Shot). The model needs background knowledge relevant to newSignal, and the retrieval of this knowledge can be guided by dataflow.
Figure 2: Overview of our approach, where dataflow analysis is crucial for both indexing and retrieval. The details of the unfinished code have been shown in Figure \ref{['fig:example']}. The rectangular boxes visualize contains relations between the code entities in the repo-specific context graph, and the solid arrows indicate depends relations.
Figure 3: Performance comparison of two prompt scopes on the CrossCodeEval dataset.
Figure 4: Performance changes with different maximum input lengths on the CrossCodeEval dataset.
Figure 5: An example of our DFG, which corresponds to the unfinished code in Figure \ref{['fig:example']}. The numbers labeled in the DFG correspond to the line numbers of the variables. The labels on the edges are the initials of the relation names defined in Section \ref{['sec:dfa']}.
...and 2 more figures

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

TL;DR

Abstract

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Authors

TL;DR

Abstract

Table of Contents

Figures (7)