Table of Contents
Fetching ...

Quality Model for Machine Learning Components

Grace A. Lewis, Rachel Brower-Sinning, Robert Edman, Ipek Ozkaya, Sebastián Echeverría, Alex Derr, Collin Beaudoin, Katherine R. Maffey

TL;DR

The paper introduces a quality model specifically for ML components to bridge the gap between component testing and system-derived requirements, addressing the shortfall where ISO-based frameworks blend system and component attributes. Grounded in ISO 25010/25059/25019 and prior studies, the authors develop a 30-QA model (7 categories) through a card-sorting methodology, with validation via an invitation-only practitioner survey and integration into the open-source MLTE tool. Findings show that practitioners currently emphasize predictive quality but recognize the value of broader QA coverage for early issue detection, while highlighting challenges such as data quality, ground truth, and operational variability. The work offers a practical contribution by providing a shared vocabulary for ML component developers and system stakeholders, plus a ready-to-use test catalog in MLTE, with planned expansions to data quality and larger-scale validation to further improve production readiness of ML-enabled systems.

Abstract

Despite increased adoption and advances in machine learning (ML), there are studies showing that many ML prototypes do not reach the production stage and that testing is still largely limited to testing model properties, such as model performance, without considering requirements derived from the system it will be a part of, such as throughput, resource consumption, or robustness. This limited view of testing leads to failures in model integration, deployment, and operations. In traditional software development, quality models such as ISO 25010 provide a widely used structured framework to assess software quality, define quality requirements, and provide a common language for communication with stakeholders. A newer standard, ISO 25059, defines a more specific quality model for AI systems. However, a problem with this standard is that it combines system attributes with ML component attributes, which is not helpful for a model developer, as many system attributes cannot be assessed at the component level. In this paper, we present a quality model for ML components that serves as a guide for requirements elicitation and negotiation and provides a common vocabulary for ML component developers and system stakeholders to agree on and define system-derived requirements and focus their testing efforts accordingly. The quality model was validated through a survey in which the participants agreed with its relevance and value. The quality model has been successfully integrated into an open-source tool for ML component testing and evaluation demonstrating its practical application.

Quality Model for Machine Learning Components

TL;DR

The paper introduces a quality model specifically for ML components to bridge the gap between component testing and system-derived requirements, addressing the shortfall where ISO-based frameworks blend system and component attributes. Grounded in ISO 25010/25059/25019 and prior studies, the authors develop a 30-QA model (7 categories) through a card-sorting methodology, with validation via an invitation-only practitioner survey and integration into the open-source MLTE tool. Findings show that practitioners currently emphasize predictive quality but recognize the value of broader QA coverage for early issue detection, while highlighting challenges such as data quality, ground truth, and operational variability. The work offers a practical contribution by providing a shared vocabulary for ML component developers and system stakeholders, plus a ready-to-use test catalog in MLTE, with planned expansions to data quality and larger-scale validation to further improve production readiness of ML-enabled systems.

Abstract

Despite increased adoption and advances in machine learning (ML), there are studies showing that many ML prototypes do not reach the production stage and that testing is still largely limited to testing model properties, such as model performance, without considering requirements derived from the system it will be a part of, such as throughput, resource consumption, or robustness. This limited view of testing leads to failures in model integration, deployment, and operations. In traditional software development, quality models such as ISO 25010 provide a widely used structured framework to assess software quality, define quality requirements, and provide a common language for communication with stakeholders. A newer standard, ISO 25059, defines a more specific quality model for AI systems. However, a problem with this standard is that it combines system attributes with ML component attributes, which is not helpful for a model developer, as many system attributes cannot be assessed at the component level. In this paper, we present a quality model for ML components that serves as a guide for requirements elicitation and negotiation and provides a common vocabulary for ML component developers and system stakeholders to agree on and define system-derived requirements and focus their testing efforts accordingly. The quality model was validated through a survey in which the participants agreed with its relevance and value. The quality model has been successfully integrated into an open-source tool for ML component testing and evaluation demonstrating its practical application.
Paper Structure (16 sections, 8 figures, 2 tables)

This paper contains 16 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: ML-Enabled System
  • Figure 2: ML Component Requirement Sources (adapted from Kuwajima et al.Kuwajima2020)
  • Figure 3: Quality Model Development Process
  • Figure 4: Quality Model for ML Components
  • Figure 5: Survey Participant Demographics
  • ...and 3 more figures