Requirements Coverage-Guided Minimization for Natural Language Test Cases
Rongqi Pan, Feifei Niu, Lionel C. Briand, Hanyang Hu
TL;DR
This work addresses redundancy in requirement-based automotive test suites by introducing RTM, a minimization approach that preserves full requirement coverage under a fixed budget. RTM combines three preprocessing options, seven text-embedding methods, and multiple similarity measures to convert NL test cases into vector representations, which are then optimized via a genetic algorithm with coverage-aware initialization strategies. On an industrial automotive dataset, RTM consistently achieves higher fault-detection-rate than baselines while maintaining 100% requirement coverage, and its performance is shown to be robust to varying redundancy levels and scalable in runtime. The study also provides open replication materials to foster reproducibility and further research in requirement-driven testing under practical constraints.
Abstract
As software systems evolve, test suites tend to grow in size and often contain redundant test cases. Such redundancy increases testing effort, time, and cost. Test suite minimization (TSM) aims to eliminate such redundancy while preserving key properties such as requirement coverage and fault detection capability. In this paper, we propose RTM (Requirement coverage-guided Test suite Minimization), a novel TSM approach designed for requirement-based testing (validation), which can effectively reduce test suite redundancy while ensuring full requirement coverage and a high fault detection rate (FDR) under a fixed minimization budget. Based on common practice in critical systems where functional safety is important, we assume test cases are specified in natural language and traced to requirements before being implemented. RTM preprocesses test cases using three different preprocessing methods, and then converts them into vector representations using seven text embedding techniques. Similarity values between vectors are computed utilizing three distance functions. A Genetic Algorithm, whose population is initialized by coverage-preserving initialization strategies, is then employed to identify an optimized subset containing diverse test cases matching the set budget. We evaluate RTM on an industrial automotive system dataset comprising $736$ system test cases and $54$ requirements. Experimental results show that RTM consistently outperforms baseline techniques in terms of FDR across different minimization budgets while maintaining full requirement coverage. Furthermore, we investigate the impact of test suite redundancy levels on the effectiveness of TSM, providing new insights into optimizing requirement-based test suites under practical constraints.
