Table of Contents
Fetching ...

Beyond More Context: How Granularity and Order Drive Code Completion Quality

Uswat Yusuf, Genevieve Caumartin, Diego Elias Costa

TL;DR

This work addresses the challenge of providing high-quality context for code completion in large repositories, where full-context is impractical due to token limits and noise. It systematically compares retrieval granularity from file-level to chunk-level, leveraging BM25 ranking, static-analysis-based chunking via Tree-sitter, and local-scope trimming, evaluated on Python and Kotlin within a Fill-in-the-middle setup across practice and public/private phases. The key findings show that chunk-level retrieval with local-scope trimming consistently improves performance over file-based approaches, achieving up to $6\%$ and $16\%$ improvements in reported phases, with per-language adaptations (Python vs Kotlin) and replication support. The results provide actionable guidance for real-world code-context pipelines and highlight open challenges in semantic retrieval, embeddings, and cross-language generalization for scalable software development tools.

Abstract

Context plays an important role in the quality of code completion, as Large Language Models (LLMs) require sufficient and relevant information to assist developers in code generation tasks. However, composing a relevant context for code completion poses challenges in large repositories: First, the limited context length of LLMs makes it impractical to include all repository files. Second, the quality of generated code is highly sensitive to noisy or irrelevant context. In this paper, we present our approach for the ASE 2025 Context Collection Challenge. The challenge entails outperforming JetBrains baselines by designing effective retrieval and context collection strategies. We develop and evaluate a series of experiments that involve retrieval strategies at both the file and chunk levels. We focus our initial experiments on examining the impact of context size and file ordering on LLM performance. Our results show that the amount and order of context can significantly influence the performance of the models. We introduce chunk-based retrieval using static analysis, achieving a 6% improvement over our best file-retrieval strategy and a 16% improvement over the no-context baseline for Python in the initial phase of the competition. Our results highlight the importance of retrieval granularity, ordering and hybrid strategies in developing effective context collection pipelines for real-world development scenarios.

Beyond More Context: How Granularity and Order Drive Code Completion Quality

TL;DR

This work addresses the challenge of providing high-quality context for code completion in large repositories, where full-context is impractical due to token limits and noise. It systematically compares retrieval granularity from file-level to chunk-level, leveraging BM25 ranking, static-analysis-based chunking via Tree-sitter, and local-scope trimming, evaluated on Python and Kotlin within a Fill-in-the-middle setup across practice and public/private phases. The key findings show that chunk-level retrieval with local-scope trimming consistently improves performance over file-based approaches, achieving up to and improvements in reported phases, with per-language adaptations (Python vs Kotlin) and replication support. The results provide actionable guidance for real-world code-context pipelines and highlight open challenges in semantic retrieval, embeddings, and cross-language generalization for scalable software development tools.

Abstract

Context plays an important role in the quality of code completion, as Large Language Models (LLMs) require sufficient and relevant information to assist developers in code generation tasks. However, composing a relevant context for code completion poses challenges in large repositories: First, the limited context length of LLMs makes it impractical to include all repository files. Second, the quality of generated code is highly sensitive to noisy or irrelevant context. In this paper, we present our approach for the ASE 2025 Context Collection Challenge. The challenge entails outperforming JetBrains baselines by designing effective retrieval and context collection strategies. We develop and evaluate a series of experiments that involve retrieval strategies at both the file and chunk levels. We focus our initial experiments on examining the impact of context size and file ordering on LLM performance. Our results show that the amount and order of context can significantly influence the performance of the models. We introduce chunk-based retrieval using static analysis, achieving a 6% improvement over our best file-retrieval strategy and a 16% improvement over the no-context baseline for Python in the initial phase of the competition. Our results highlight the importance of retrieval granularity, ordering and hybrid strategies in developing effective context collection pipelines for real-world development scenarios.

Paper Structure

This paper contains 14 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Number of files vs chrF score