Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring
Dorin Pomian, Abhiram Bellur, Malinda Dilhara, Zarina Kurbatova, Egor Bogomolov, Timofey Bryksin, Danny Dig
TL;DR
The paper tackles the challenge of Extract Method refactoring by integrating Large Language Models with IDE static analysis in an end-to-end EM-Assist workflow. It combines iterative LLM prompting, rigorous filtering of invalid and non-useful suggestions via static analysis and program slicing, and a principled ranking mechanism to present high-quality candidates that developers can apply within the IDE. Empirical evaluation on large, real-world corpora shows that EM-Assist outperforms state-of-the-art static-analysis and ML-based tools in recall, and developer surveys indicate strong practical usefulness and adoption potential. The work demonstrates a viable path for AI-assisted refactoring that respects developer practices and preserves code correctness, with open data and tooling to support reproducibility. Overall, EM-Assist enables a collaborative human-AI refactoring process that expands what developers can achieve with AI while maintaining safety and control in software maintenance.
Abstract
Long methods that encapsulate multiple responsibilities within a single method are challenging to maintain. Choosing which statements to extract into new methods has been the target of many research tools. Despite steady improvements, these tools often fail to generate refactorings that align with developers' preferences and acceptance criteria. Given that Large Language Models (LLMs) have been trained on large code corpora, if we harness their familiarity with the way developers form functions, we could suggest refactorings that developers are likely to accept. In this paper, we advance the science and practice of refactoring by synergistically combining the insights of LLMs with the power of IDEs to perform Extract Method (EM). Our formative study on 1752 EM scenarios revealed that LLMs are very effective for giving expert suggestions, yet they are unreliable: up to 76.3% of the suggestions are hallucinations. We designed a novel approach that removes hallucinations from the candidates suggested by LLMs, then further enhances and ranks suggestions based on static analysis techniques from program slicing, and finally leverages the IDE to execute refactorings correctly. We implemented this approach in an IntelliJ IDEA plugin called EM-Assist. We empirically evaluated EM-Assist on a diverse corpus that replicates 1752 actual refactorings from open-source projects. We found that EM-Assist outperforms previous state of the art tools: EM-Assist suggests the developerperformed refactoring in 53.4% of cases, improving over the recall rate of 39.4% for previous best-in-class tools. Furthermore, we conducted firehouse surveys with 16 industrial developers and suggested refactorings on their recent commits. 81.3% of them agreed with the recommendations provided by EM-Assist.
