STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis
Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, Yiling Lou
TL;DR
This work tackles repository-level code completion by integrating static analysis into the LLM inference pipeline. It introduces STALL+, a flexible framework that injects static-analysis insights at prompting, decoding, and post-processing stages, and evaluates its effectiveness on CrossCodeEval across Java and Python with three state-of-the-art LLMs. Key findings show that prompting-phase integration with file-level dependencies yields the largest improvements, while post-processing offers the least gain; combining multiple strategies and pairing static analysis with RAG delivers the best accuracy and a favorable efficiency-accuracy trade-off. The results provide practical guidance for deploying static-analysis-enhanced LLMs in real-world, large-scale codebases and establish a strong baseline for future work in this space.
Abstract
Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based repository-level code completion by investigating both the effectiveness and efficiency of static analysis integration strategies across different phases of code completion. We first implement a framework STALL+, which supports an extendable and customizable integration of multiple static analysis strategies into the complete pipeline of LLM-based repository-level code completion; and based on STALL+, we perform extensive experiments by including different code LLMs on the latest repository-level code completion benchmark CrossCodeEval. Our findings show that integrating file-level dependencies in prompting phase performs the best while the integration in post-processing phase performs the worse. Additionally, we observe different improvements from static analysis between dynamic languages and static languages, i.e., the best combination is prompting-phase with decoding-phase integration for Java while the best combination is prompting-phase with post-processing-phase integration for Python given the limitations of statically analyzing dynamic languages. Additionally, we find the complementarity between RAG and static analysis integration as well as their cost-effectiveness after combination.
