Table of Contents
Fetching ...

STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis

Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, Yiling Lou

TL;DR

This work tackles repository-level code completion by integrating static analysis into the LLM inference pipeline. It introduces STALL+, a flexible framework that injects static-analysis insights at prompting, decoding, and post-processing stages, and evaluates its effectiveness on CrossCodeEval across Java and Python with three state-of-the-art LLMs. Key findings show that prompting-phase integration with file-level dependencies yields the largest improvements, while post-processing offers the least gain; combining multiple strategies and pairing static analysis with RAG delivers the best accuracy and a favorable efficiency-accuracy trade-off. The results provide practical guidance for deploying static-analysis-enhanced LLMs in real-world, large-scale codebases and establish a strong baseline for future work in this space.

Abstract

Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based repository-level code completion by investigating both the effectiveness and efficiency of static analysis integration strategies across different phases of code completion. We first implement a framework STALL+, which supports an extendable and customizable integration of multiple static analysis strategies into the complete pipeline of LLM-based repository-level code completion; and based on STALL+, we perform extensive experiments by including different code LLMs on the latest repository-level code completion benchmark CrossCodeEval. Our findings show that integrating file-level dependencies in prompting phase performs the best while the integration in post-processing phase performs the worse. Additionally, we observe different improvements from static analysis between dynamic languages and static languages, i.e., the best combination is prompting-phase with decoding-phase integration for Java while the best combination is prompting-phase with post-processing-phase integration for Python given the limitations of statically analyzing dynamic languages. Additionally, we find the complementarity between RAG and static analysis integration as well as their cost-effectiveness after combination.

STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis

TL;DR

This work tackles repository-level code completion by integrating static analysis into the LLM inference pipeline. It introduces STALL+, a flexible framework that injects static-analysis insights at prompting, decoding, and post-processing stages, and evaluates its effectiveness on CrossCodeEval across Java and Python with three state-of-the-art LLMs. Key findings show that prompting-phase integration with file-level dependencies yields the largest improvements, while post-processing offers the least gain; combining multiple strategies and pairing static analysis with RAG delivers the best accuracy and a favorable efficiency-accuracy trade-off. The results provide practical guidance for deploying static-analysis-enhanced LLMs in real-world, large-scale codebases and establish a strong baseline for future work in this space.

Abstract

Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based repository-level code completion by investigating both the effectiveness and efficiency of static analysis integration strategies across different phases of code completion. We first implement a framework STALL+, which supports an extendable and customizable integration of multiple static analysis strategies into the complete pipeline of LLM-based repository-level code completion; and based on STALL+, we perform extensive experiments by including different code LLMs on the latest repository-level code completion benchmark CrossCodeEval. Our findings show that integrating file-level dependencies in prompting phase performs the best while the integration in post-processing phase performs the worse. Additionally, we observe different improvements from static analysis between dynamic languages and static languages, i.e., the best combination is prompting-phase with decoding-phase integration for Java while the best combination is prompting-phase with post-processing-phase integration for Python given the limitations of statically analyzing dynamic languages. Additionally, we find the complementarity between RAG and static analysis integration as well as their cost-effectiveness after combination.
Paper Structure (25 sections, 5 equations, 5 figures, 4 tables)

This paper contains 25 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of the STALL$^+$ static analysis integration framework
  • Figure 2: Comparison case of three strategies
  • Figure 3: Bad case categories in decoding-phase integration (prompt, ground truth, model prediction, explanation)
  • Figure 4: Combining RAG and static analysis integration strategies
  • Figure 5: Comparison example of RAG and Prompt-F