Using Quality Attribute Scenarios for ML Model Test Case Generation
Rachel Brower-Sinning, Grace A. Lewis, Sebastían Echeverría, Ipek Ozkaya
TL;DR
The paper addresses the gap in ML testing by focusing solely on performance and proposes QA scenarios to capture system- and data-level requirements for ML-enabled systems. It integrates QA-driven test case generation into MLTE, enabling negotiation, specification, and testing of system and QA properties alongside process and model behavior. Through a botanical garden scenario, the authors demonstrate how QA scenarios yield test cases across fairness, robustness, interpretability, and integration constraints, with practitioners reporting earlier failure detection and reduced production incidents. The approach promotes broader architectural thinking in ML development and provides a path toward more trustworthy, production-ready deployments; future work includes formalizing negotiation artifacts and expanding QA coverage.
Abstract
Testing of machine learning (ML) models is a known challenge identified by researchers and practitioners alike. Unfortunately, current practice for ML model testing prioritizes testing for model performance, while often neglecting the requirements and constraints of the ML-enabled system that integrates the model. This limited view of testing leads to failures during integration, deployment, and operations, contributing to the difficulties of moving models from development to production. This paper presents an approach based on quality attribute (QA) scenarios to elicit and define system- and model-relevant test cases for ML models. The QA-based approach described in this paper has been integrated into MLTE, a process and tool to support ML model test and evaluation. Feedback from users of MLTE highlights its effectiveness in testing beyond model performance and identifying failures early in the development process.
