REPOFUSE: Repository-Level Code Completion with Fused Dual Context

Ming Liang; Xiaoheng Xie; Gehao Zhang; Xunjin Zheng; Peng Di; wei jiang; Hongwei Chen; Chengpeng Wang; Gang Fan

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

Ming Liang, Xiaoheng Xie, Gehao Zhang, Xunjin Zheng, Peng Di, wei jiang, Hongwei Chen, Chengpeng Wang, Gang Fan

TL;DR

RepoFuse tackles the Context-Latency Conundrum in repository-level code completion by fusing two cross-file contexts—analogy context and rationale context—and compressing them with rank truncated generation (RTG) to fit a fixed token budget. The approach leverages a Code Knowledge Graph for rationale signals and similarity-based retrieval for analogy signals, producing a truncated dual-context prompt that preserves essential information. Empirical results on CrossCodeEval show substantial gains in exact-match completion accuracy (Code_EM) and notable throughput improvements, achieving high performance with only a fraction of the typical context length. The work demonstrates practical impact by integrating RepoFuse into enterprise workflows and outlines future directions, including context pruning, FIM integration, and scaling to larger LMs.

Abstract

The success of language models in code assistance has spurred the proposal of repository-level code completion as a means to enhance prediction accuracy, utilizing the context from the entire codebase. However, this amplified context can inadvertently increase inference latency, potentially undermining the developer experience and deterring tool adoption - a challenge we termed the Context-Latency Conundrum. This paper introduces REPOFUSE, a pioneering solution designed to enhance repository-level code completion without the latency trade-off. REPOFUSE uniquely fuses two types of context: the analogy context, rooted in code analogies, and the rationale context, which encompasses in-depth semantic relationships. We propose a novel rank truncated generation (RTG) technique that efficiently condenses these contexts into prompts with restricted size. This enables REPOFUSE to deliver precise code completions while maintaining inference efficiency. Through testing with the CrossCodeEval suite, REPOFUSE has demonstrated a significant leap over existing models, achieving a 40.90% to 59.75% increase in exact match (EM) accuracy for code completions and a 26.8% enhancement in inference speed. Beyond experimental validation, REPOFUSE has been integrated into the workflow of a large enterprise, where it actively supports various coding tasks.

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

TL;DR

Abstract

Paper Structure (25 sections, 1 equation, 15 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 1 equation, 15 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries
Methodology
Rationale Context Analysis
Analogy Context Retrieval
RTG-Empowered Completion
Experiments and Results
Implementation
Benchmark
Configuration
Results and Analysis
Performance
Inference Efficiency
Comparison of RTG Score Functions
Related Work
...and 10 more sections

Figures (15)

Figure 1: The workflow of RepoFuse
Figure 2: Illustrative Example of Rationale Context and Analogy Context in Use
Figure 3: The comparison of Code_EM performance on all truncation sizes for AC, RC and DC.
Figure 4: Convergence Analysis of Code_EM for Generated Cases by AC, RC and DC on DeepSeek-Coder-7B.
Figure 5: Comparative Assessment of Code_EM Values Across Various Truncation Sizes Using DC with Diverse Scoring Functions.
...and 10 more figures

Theorems & Definitions (6)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

TL;DR

Abstract

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

Authors

TL;DR

Abstract

Table of Contents

Figures (15)

Theorems & Definitions (6)