Domain Adaptation for Code Model-based Unit Test Case Generation
Jiho Shin, Sepehr Hashtroudi, Hadi Hemmati, Song Wang
TL;DR
The paper tackles domain shift in neural unit test generation by introducing project-level domain adaptation for CodeT5. It first fine-tunes CodeT5 on a test-generation task and then applies per-project adaptation using developer-written tests to produce compilable, test-adequate unit tests. Empirical results on Methods2Test and Defects4j show large gains over task-only fine-tuning and strong baselines like GPT-4 and A3Test, with notable improvements in parse/compile rates, line coverage, and mutation score. The approach also demonstrates complementary benefits when combined with EvoSuite, offering fast, line-covering test generation that augments existing SBST methods in practice.
Abstract
Recently, deep learning-based test case generation approaches have been proposed to automate the generation of unit test cases. In this study, we leverage Transformer-based code models to generate unit tests with the help of Domain Adaptation (DA) at a project level. Specifically, we use CodeT5, a relatively small language model trained on source code data, and fine-tune it on the test generation task. Then, we apply domain adaptation to each target project data to learn project-specific knowledge (project-level DA). We use the Methods2test dataset to fine-tune CodeT5 for the test generation task and the Defects4j dataset for project-level domain adaptation and evaluation. We compare our approach with (a) CodeT5 fine-tuned on the test generation without DA, (b) the A3Test tool, and (c) GPT-4 on five projects from the Defects4j dataset. The results show that tests generated using DA can increase the line coverage by 18.62%, 19.88%, and 18.02% and mutation score by 16.45%, 16.01%, and 12.99% compared to the above (a), (b), and (c) baselines, respectively. The overall results show consistent improvements in metrics such as parse rate, compile rate, BLEU, and CodeBLEU. In addition, we show that our approach can be seen as a complementary solution alongside existing search-based test generation tools such as EvoSuite, to increase the overall coverage and mutation scores with an average of 34.42% and 6.8%, for line coverage and mutation score, respectively.
