Generating REST API Tests With Descriptive Names
Philip Garrett, Juan P. Galeotti, Andrea Arcuri, Alexander Poth, Olsi Rrjolli
TL;DR
The paper tackles the problem of non-descriptive names for automatically generated REST API tests and introduces three deterministic, rule-based naming strategies, evaluating them against eight diverse approaches including LLM-based methods. Using EvoMaster to generate REST-Assured tests and combining human surveys with an industrial study from Volkswagen, the authors show that the testCondition approach yields the highest readability among deterministic methods and performs on par with Gemini and GPT-4o (and better than GPT-3.5). The findings indicate that lightweight deterministic naming can match or exceed LLM-based approaches in practice, offering speed, lower cost, and fewer security concerns. The work demonstrates practical utility for developer-focused API testing and motivates future work on richer semantic annotations and cross-language applicability.
Abstract
Automated test generation has become a key technique for ensuring software quality, particularly in modern API-based architectures. However, automatically generated test cases are typically assigned non-descriptive names (e.g., test0, test1), which reduces their readability and hinders their usefulness during comprehension and maintenance. In this work, we present three novel deterministic techniques to generate REST API test names. We then compare eight techniques in total for generating descriptive names for REST API tests automatically produced by the fuzzer EvoMaster, using 10 test cases generated for 9 different open-source APIs. The eight techniques include rule-based heuristics and large language model (LLM)-based approaches. Their effectiveness was empirically evaluated through two surveys (involving up to 39 people recruited via LinkedIn). Our results show that a rule-based approach achieves the highest clarity ratings among deterministic methods, performs on par with state-of-the-art LLM-based models such as Gemini and GPT-4o, and significantly outperforms GPT-3.5. To further evaluate the practical impact of our results, an industrial case study was carried out with practitioners who actively use EvoMaster at Volkswagen AG. A developer questionnaire was then carried out based on the use of EvoMaster on four different APIs by four different users, for a total of 74 evaluated test cases. Feedback from practitioners further confirms that descriptive names produced by this approach improve test suite readability. These findings highlight that lightweight, deterministic techniques can serve as effective alternatives to computationally expensive and security-sensitive LLM-based approaches for automated system-level test naming, providing a practical step toward more developer-friendly API test generation.
