Automated Unit Test Refactoring
Yi Gao, Xing Hu, Xiaohu Yang, Xin Xia
TL;DR
This work tackles the problem of test smells in unit tests by introducing UTRefactor, a context-enhanced, LLM-based framework that uses an external knowledge base and a domain-specific language (DSL) to automate test refactoring in Java. The system combines preprocessing (test extraction, smell detection, and test context collection), a knowledge base (standardized smell definitions and 13 DSL-based refactoring rules), and CoT-guided refactoring with a checkpoint mechanism to ensure complete smell elimination. Evaluated on six open-source Java projects, UTRefactor reduced 2,375 smells to 265 (an 89% reduction) and outperformed baseline approaches, including a generic LLM prompt and TESTAXE, across multiple smell categories. The results demonstrate improved test quality and maintainable refactoring with practical implications for automated testing pipelines, while pointing to ongoing challenges such as handling certain smells and extending to other languages.
Abstract
Test smells arise from poor design practices and insufficient domain knowledge, which can lower the quality of test code and make it harder to maintain and update. Manually refactoring test smells is time-consuming and error-prone, highlighting the necessity for automated approaches. Current rule-based refactoring methods often struggle in scenarios not covered by predefined rules and lack the flexibility needed to handle diverse cases effectively. In this paper, we propose a novel approach called UTRefactor, a context-enhanced, LLM-based framework for automatic test refactoring in Java projects. UTRefactor extracts relevant context from test code and leverages an external knowledge base that includes test smell definitions, descriptions, and DSL-based refactoring rules. By simulating the manual refactoring process through a chain-of-thought approach, UTRefactor guides the LLM to eliminate test smells in a step-by-step process, ensuring both accuracy and consistency throughout the refactoring. Additionally, we implement a checkpoint mechanism to facilitate comprehensive refactoring, particularly when multiple smells are present. We evaluate UTRefactor on 879 tests from six open-source Java projects, reducing the number of test smells from 2,375 to 265, achieving an 89% reduction. UTRefactor outperforms direct LLM-based refactoring methods by 61.82% in smell elimination and significantly surpasses the performance of a rule-based test smell refactoring tool. Our results demonstrate the effectiveness of UTRefactor in enhancing test code quality while minimizing manual involvement.
