Generating High-Level Test Cases from Requirements using LLM: An Industry Study
Satoshi Masuda, Satoshi Kouzawa, Kyousuke Sezai, Hidetoshi Suhara, Yasuaki Hiruta, Kunihiro Kudou
TL;DR
This paper tackles the manual bottleneck of generating high-level test cases from requirement documents by introducing a prompt-only method (GHL) that avoids retrieval-augmented generation (RAG). The approach first extracts test design techniques from requirements and then generates HL test cases for each technique, using only prompts and a test strategy as guidance. Validation on Bluetooth and Mozilla datasets shows macro-recall improvements, with Bluetooth achieving $0.84$ and Mozilla $0.37$, indicating practical feasibility for industry, while highlighting ongoing challenges in precision and test execution automation. Overall, the work demonstrates a viable path toward scalable, RAG-free HL test-case generation, with clear directions for expanding datasets and integrating automated test execution.
Abstract
Currently, generating high-level test cases described in natural language from requirement documents is performed manually. In the industry, including companies specializing in software testing, there is a significant demand for the automatic generation of high-level test cases from requirement documents using Large Language Models (LLMs). Efforts to utilize LLMs for requirement analysis are underway. In some cases, retrieval-augmented generation (RAG) is employed for generating high-level test cases using LLMs. However, in practical applications, it is necessary to create a RAG tailored to the knowledge system of each specific application, which is labor-intensive. Moreover, when applying high-level test case generation as a prompt, there is no established method for instructing the generation of high-level test cases at a level applicable to other specifications without using RAG. It is required to establish a method for the automatic generation of high-level test cases that can be generalized across a wider range of requirement documents. In this paper, we propose a method for generating high-level (GHL) test cases from requirement documents using only prompts, without creating RAGs. In the proposed method, first, the requirement document is input into the LLM to generate test design techniques corresponding to the requirement document. Then, high-level test cases are generated for each of the generated test design techniques. Furthermore, we verify an evaluation method based on semantic similarity of the generated high-level test cases. In the experiments, we confirmed the method using datasets from Bluetooth and Mozilla, where requirement documents and high-level test cases are available, achieving macro-recall measurement of 0.81 and 0.37, respectively. We believe that the method is feasible for practical application in generating high-level test cases without using RAG.
