Table of Contents
Fetching ...

Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding

Xinkui Zhao, Rongkai Liu, Yifan Zhang, Chen Zhi, Lufei Zhang, Guanjie Cheng, Yueshen Xu, Shuiguang Deng, Jianwei Yin

TL;DR

This work targets repository-level code completion by addressing the shortcomings of traditional RAG approaches that treat code as plain text. It introduces CoCo, a comprehension-first framework that uses static analysis to extract function-, file-, and project-level context, then distills this information with a graph-based selector and a structure-aware re-ranker before prompting an LLM. The method integrates multi-granularity context with retrieved exemplars to produce more semantically and structurally coherent code completions, and it demonstrates strong gains across CrossCodeEval and RepoEval while remaining model-agnostic. Experiments show substantial EM improvements (up to 20.2%), good generalizability, and acceptable latency overhead, underscoring CoCo’s practical impact on real-world repository-level code generation.

Abstract

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as plain natural language, relying primarily on shallow semantic matching while overlooking structural semantics and code-specific dependencies. This limits their ability to capture control flow and underlying intent, ultimately constraining the quality of generated code. Therefore, we propose CoCo, a novel framework that enables code Completion by Comprehension of multi-granularity context from large-scale code repositories. CoCo employs static code analysis to extract structured context at the function, file, and project levels, capturing execution logic and semantic dependencies. It then adopts an graph-based multi-granularity context selection mechanism to filter out redundant information and remove noise. Consequently, the information is converted into natural language in a consistent manner, thereby functioning as explicit contextual prompts to guide subsequent code completion. Additionally, a structure-aware code re-ranker mechanism ensures alignment at both semantic and structural levels. Extensive experiments on CrossCodeEval and RepoEval benchmarks demonstrate that CoCo consistently surpasses state-of-the-art baselines, achieving up to 20.2% gains in EM. Moreover, the framework is model-agnostic and can be seamlessly integrated into existing methods, leading to significant performance.

Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding

TL;DR

This work targets repository-level code completion by addressing the shortcomings of traditional RAG approaches that treat code as plain text. It introduces CoCo, a comprehension-first framework that uses static analysis to extract function-, file-, and project-level context, then distills this information with a graph-based selector and a structure-aware re-ranker before prompting an LLM. The method integrates multi-granularity context with retrieved exemplars to produce more semantically and structurally coherent code completions, and it demonstrates strong gains across CrossCodeEval and RepoEval while remaining model-agnostic. Experiments show substantial EM improvements (up to 20.2%), good generalizability, and acceptable latency overhead, underscoring CoCo’s practical impact on real-world repository-level code generation.

Abstract

As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation (RAG) methods typically treat code as plain natural language, relying primarily on shallow semantic matching while overlooking structural semantics and code-specific dependencies. This limits their ability to capture control flow and underlying intent, ultimately constraining the quality of generated code. Therefore, we propose CoCo, a novel framework that enables code Completion by Comprehension of multi-granularity context from large-scale code repositories. CoCo employs static code analysis to extract structured context at the function, file, and project levels, capturing execution logic and semantic dependencies. It then adopts an graph-based multi-granularity context selection mechanism to filter out redundant information and remove noise. Consequently, the information is converted into natural language in a consistent manner, thereby functioning as explicit contextual prompts to guide subsequent code completion. Additionally, a structure-aware code re-ranker mechanism ensures alignment at both semantic and structural levels. Extensive experiments on CrossCodeEval and RepoEval benchmarks demonstrate that CoCo consistently surpasses state-of-the-art baselines, achieving up to 20.2% gains in EM. Moreover, the framework is model-agnostic and can be seamlessly integrated into existing methods, leading to significant performance.

Paper Structure

This paper contains 31 sections, 2 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Limitations of traditional retrieval-based methods can introduce semantic and structural context gaps, leading to the generation of incorrect code.
  • Figure 2: Oveiview of CoCo.
  • Figure 3: Illustration of Function-Level Analysis. The yellow code block indicates unfinished code.
  • Figure 4: Illustration of File-Level Analysis. The yellow code block indicates unfinished code. The green code block indicates useful information.
  • Figure 5: Illustration of Project-Level Analysis. The yellow code block indicates unfinished code. The green code block indicates useful information.
  • ...and 4 more figures