SETBVE: Quality-Diversity Driven Exploration of Software Boundary Behaviors
Sabinakhon Akbarova, Felix Dobslaw, Francisco Gomes de Oliveira Neto, Robert Feldt
TL;DR
SETBVE reframes boundary value exploration as a Quality-Diversity optimization problem to systematically uncover a broad spectrum of software boundaries. By integrating a modular trio of Sampler, Explorer, and Tracer within a grid-archive of behavioral descriptors, it achieves superior boundary diversity while maintaining competitive boundary quality compared to AutoBVA. Across ten SUTs, SETBVE notably increases archive coverage (RAC) with modest or equal reductions in PD-based quality (RPD), especially as runtime scales to 600 seconds. This approach offers a flexible, reproducible framework for discovering edge-case behaviors and suggests directions for within-cell optimization and richer descriptors to further enhance practical testing outcomes.
Abstract
Software systems exhibit distinct behaviors based on input characteristics, and failures often occur at the boundaries between input domains. Traditional Boundary Value Analysis (BVA) relies on manual heuristics, while automated Boundary Value Exploration (BVE) methods typically optimize a single quality metric, risking a narrow and incomplete survey of boundary behaviors. We introduce SETBVE, a customizable, modular framework for automated black-box BVE that leverages Quality-Diversity (QD) optimization to systematically uncover and refine a broader spectrum of boundaries. SETBVE maintains an archive of boundary pairs organized by input- and output-based behavioral descriptors. It steers exploration toward underrepresented regions while preserving high-quality boundary pairs and applies local search to refine candidate boundaries. In experiments with ten integer-based functions, SETBVE outperforms the baseline in diversity, boosting archive coverage by 37 to 82 percentage points. A qualitative analysis reveals that SETBVE identifies boundary candidates the baseline misses. While the baseline method typically plateaus in both diversity and quality after 30 seconds, SETBVE continues to improve in 600-second runs, demonstrating better scalability. Even the simplest SETBVE configurations perform well in identifying diverse boundary behaviors. Our findings indicate that balancing quality with behavioral diversity can help identify more software edge-case behaviors than quality-focused approaches.
