Relative Positioning Based Code Chunking Method For Rich Context Retrieval In Repository Level Code Completion Task With Code Language Model
Imranur Rahman, Md Rayhanur Rahman
TL;DR
The paper tackles the challenge of identifying useful context for repository-level code completion with large language models. It presents a language-agnostic pipeline that chunks code into overlapping blocks, computes embeddings, and uses a hybrid BM25 and FAISS retriever to fetch syntactic and semantic neighbors, augmented by next/prev chunks to form the final context. The approach combines the completion file, recently opened files, and neighborhoods around prefix and suffix to improve fill-in-the-middle predictions, achieving competitive chrF scores and Bronze placements in Kotlin and Python tracks. This work demonstrates a scalable method for enriching code context across repositories, enabling more effective code completion with LLMs while highlighting directions for dynamic chunking and hierarchical code representations.
Abstract
Code completion can help developers improve efficiency and ease the development lifecycle. Although code completion is available in modern integrated development environments (IDEs), research lacks in determining what makes a good context for code completion based on the information available to the IDEs for the large language models (LLMs) to perform better. In this paper, we describe an effective context collection strategy to assist the LLMs in performing better at code completion tasks. The key idea of our strategy is to preprocess the repository into smaller code chunks and later use syntactic and semantic similarity-based code chunk retrieval with relative positioning. We found that code chunking and relative positioning of the chunks in the final context improve the performance of code completion tasks.
