Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring

Dorin Pomian; Abhiram Bellur; Malinda Dilhara; Zarina Kurbatova; Egor Bogomolov; Timofey Bryksin; Danny Dig

Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring

Dorin Pomian, Abhiram Bellur, Malinda Dilhara, Zarina Kurbatova, Egor Bogomolov, Timofey Bryksin, Danny Dig

TL;DR

The paper tackles the challenge of Extract Method refactoring by integrating Large Language Models with IDE static analysis in an end-to-end EM-Assist workflow. It combines iterative LLM prompting, rigorous filtering of invalid and non-useful suggestions via static analysis and program slicing, and a principled ranking mechanism to present high-quality candidates that developers can apply within the IDE. Empirical evaluation on large, real-world corpora shows that EM-Assist outperforms state-of-the-art static-analysis and ML-based tools in recall, and developer surveys indicate strong practical usefulness and adoption potential. The work demonstrates a viable path for AI-assisted refactoring that respects developer practices and preserves code correctness, with open data and tooling to support reproducibility. Overall, EM-Assist enables a collaborative human-AI refactoring process that expands what developers can achieve with AI while maintaining safety and control in software maintenance.

Abstract

Long methods that encapsulate multiple responsibilities within a single method are challenging to maintain. Choosing which statements to extract into new methods has been the target of many research tools. Despite steady improvements, these tools often fail to generate refactorings that align with developers' preferences and acceptance criteria. Given that Large Language Models (LLMs) have been trained on large code corpora, if we harness their familiarity with the way developers form functions, we could suggest refactorings that developers are likely to accept. In this paper, we advance the science and practice of refactoring by synergistically combining the insights of LLMs with the power of IDEs to perform Extract Method (EM). Our formative study on 1752 EM scenarios revealed that LLMs are very effective for giving expert suggestions, yet they are unreliable: up to 76.3% of the suggestions are hallucinations. We designed a novel approach that removes hallucinations from the candidates suggested by LLMs, then further enhances and ranks suggestions based on static analysis techniques from program slicing, and finally leverages the IDE to execute refactorings correctly. We implemented this approach in an IntelliJ IDEA plugin called EM-Assist. We empirically evaluated EM-Assist on a diverse corpus that replicates 1752 actual refactorings from open-source projects. We found that EM-Assist outperforms previous state of the art tools: EM-Assist suggests the developerperformed refactoring in 53.4% of cases, improving over the recall rate of 39.4% for previous best-in-class tools. Furthermore, we conducted firehouse surveys with 16 industrial developers and suggested refactorings on their recent commits. 81.3% of them agreed with the recommendations provided by EM-Assist.

Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring

TL;DR

Abstract

Paper Structure (29 sections, 2 equations, 4 figures, 4 tables)

This paper contains 29 sections, 2 equations, 4 figures, 4 tables.

Introduction
Motivating Example
Technique
Generating EM Suggestions
Prompt engineering
Generating an extensive array of suggestions
Removing Invalid EM Suggestions
Removing Refactoring Suggestions That Are Not Useful
Enhancing Refactoring Suggestions
Ranking Refactoring Suggestions
Evaluation
Datasets
RQ1: Effectiveness of LLMs
Subject Systems and Experimental Setup
Results
...and 14 more sections

Figures (4)

Figure 1: The numbered code snippets represent (1) an extract function refactoring in the project Neo4j, commit a05a8c5, (2) a suggestion made by static analysis tool, (3) an invalid suggestion from LLM, (4) a not useful suggestion from LLM.
Figure 2: The workflow of generating refactoring suggestions and then applying them with EM-Assist.
Figure 4: The capabilities of LLMs in generating refactoring suggestions. The plots show the number of suggestions per host method (notice the exponential scale)
Figure 5: Change of Recall@5 along with Temperature and iterations for the Community Corpus

Theorems & Definitions (5)

Definition 3.1
Definition 3.2
Definition 3.3
Definition 3.4
Definition 3.5

Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring

TL;DR

Abstract

Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (5)