Table of Contents
Fetching ...

How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations

Eman Abdullah AlOmar, Anushkrishna Venkatakrishnan, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni

TL;DR

This study investigates how developers articulate refactoring needs when interacting with a large language model (ChatGPT) by mining 17,913 developer-Cha tGPT conversations from the DevGPT dataset. It combines dataset preprocessing, multilingual translation, and thematic analysis to extract 43 refactoring patterns and categorize ChatGPT’s focus into internal/external quality attributes and code smells. Findings show that developers provide either generic or code-centric prompts, while ChatGPT tends to specify refactoring intentions and emphasize design-quality attributes, yet can misinterpret larger codebases. The work highlights prompt engineering as pivotal, reveals variability in learning settings (zero-shot vs few-shot), and suggests improvements for integration of AI-assisted refactoring tools into software development workflows.

Abstract

Large Language Models (LLMs), like ChatGPT, have gained widespread popularity and usage in various software engineering tasks, including refactoring, testing, code review, and program comprehension. Despite recent studies delving into refactoring documentation in commit messages, issues, and code review, little is known about how developers articulate their refactoring needs when interacting with ChatGPT. In this paper, our goal is to explore conversations between developers and ChatGPT related to refactoring to better understand how developers identify areas for improvement in code and how ChatGPT addresses developers' needs. Our approach relies on text mining refactoring-related conversations from 17,913 ChatGPT prompts and responses, and investigating developers' explicit refactoring intention. Our results reveal that (1) developer-ChatGPT conversations commonly involve generic and specific terms/phrases; (2) developers often make generic refactoring requests, while ChatGPT typically includes the refactoring intention; and (3) various learning settings when prompting ChatGPT in the context of refactoring. We envision that our findings contribute to a broader understanding of the collaboration between developers and AI models, in the context of code refactoring, with implications for model improvement, tool development, and best practices in software engineering.

How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations

TL;DR

This study investigates how developers articulate refactoring needs when interacting with a large language model (ChatGPT) by mining 17,913 developer-Cha tGPT conversations from the DevGPT dataset. It combines dataset preprocessing, multilingual translation, and thematic analysis to extract 43 refactoring patterns and categorize ChatGPT’s focus into internal/external quality attributes and code smells. Findings show that developers provide either generic or code-centric prompts, while ChatGPT tends to specify refactoring intentions and emphasize design-quality attributes, yet can misinterpret larger codebases. The work highlights prompt engineering as pivotal, reveals variability in learning settings (zero-shot vs few-shot), and suggests improvements for integration of AI-assisted refactoring tools into software development workflows.

Abstract

Large Language Models (LLMs), like ChatGPT, have gained widespread popularity and usage in various software engineering tasks, including refactoring, testing, code review, and program comprehension. Despite recent studies delving into refactoring documentation in commit messages, issues, and code review, little is known about how developers articulate their refactoring needs when interacting with ChatGPT. In this paper, our goal is to explore conversations between developers and ChatGPT related to refactoring to better understand how developers identify areas for improvement in code and how ChatGPT addresses developers' needs. Our approach relies on text mining refactoring-related conversations from 17,913 ChatGPT prompts and responses, and investigating developers' explicit refactoring intention. Our results reveal that (1) developer-ChatGPT conversations commonly involve generic and specific terms/phrases; (2) developers often make generic refactoring requests, while ChatGPT typically includes the refactoring intention; and (3) various learning settings when prompting ChatGPT in the context of refactoring. We envision that our findings contribute to a broader understanding of the collaboration between developers and AI models, in the context of code refactoring, with implications for model improvement, tool development, and best practices in software engineering.
Paper Structure (9 sections, 4 figures, 2 tables)

This paper contains 9 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Example of a ChatGPT conversation in the context of GitHub issue about refactoring Example.
  • Figure 2: Overview of our experiment design.
  • Figure 3: Popular refactoring textual patterns.
  • Figure 4: ChatGPT conversation patterns to refactor code.