System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT

Shreya Bhatia; Tarushi Gandhi; Dhruv Kumar; Pankaj Jalote

System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT

Shreya Bhatia, Tarushi Gandhi, Dhruv Kumar, Pankaj Jalote

TL;DR

This study investigates using LLMs to generate system test-case designs directly from Software Requirements Specifications (SRS). A two-stage prompt-chaining approach with ChatGPT-4o Turbo was applied to five real-world SRS documents, generating 10–11 test cases per use case and evaluating them through developer feedback. Findings show that about 87.7% of generated test cases are valid, with 15.2% representing previously overlooked but valid tests, and 2.6% identified as redundant; redundancy detection achieved partial alignment with developers but included false positives. The work demonstrates substantial potential for LLM-assisted test design to improve coverage and efficiency, while highlighting challenges in redundancy precision and the need for broader datasets and architecture-aware prompting in future work.

Abstract

System testing is essential in any software development project to ensure that the final products meet the requirements. Creating comprehensive test cases for system testing from requirements is often challenging and time-consuming. This paper explores the effectiveness of using Large Language Models (LLMs) to generate test case designs from Software Requirements Specification (SRS) documents. In this study, we collected the SRS documents of five software engineering projects containing functional and non-functional requirements, which were implemented, tested, and delivered by respective developer teams. For generating test case designs, we used ChatGPT-4o Turbo model. We employed prompt-chaining, starting with an initial context-setting prompt, followed by prompts to generate test cases for each use case. We assessed the quality of the generated test case designs through feedback from the same developer teams as mentioned above. Our experiments show that about 87 percent of the generated test cases were valid, with the remaining 13 percent either not applicable or redundant. Notably, 15 percent of the valid test cases were previously not considered by developers in their testing. We also tasked ChatGPT with identifying redundant test cases, which were subsequently validated by the respective developers to identify false positives and to uncover any redundant test cases that may have been missed by the developers themselves. This study highlights the potential of leveraging LLMs for test generation from the Requirements Specification document and also for assisting developers in quickly identifying and addressing redundancies, ultimately improving test suite quality and efficiency of the testing procedure.

System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT

TL;DR

Abstract

System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT

TL;DR

Abstract

Paper Structure

Table of Contents