Table of Contents
Fetching ...

From Machine Learning Documentation to Requirements: Bridging Processes with Requirements Languages

Yi Peng, Hans-Martin Heyn, Jennifer Horkoff

TL;DR

The paper tackles the challenge of deriving software requirements for ML-enabled systems from informal ML documentation. It analyzes 20 publicly available ModelCards and DataSheets to quantify RE-relevant information and assesses how well three RE representation languages—EARS, Rupp's Template, and Volere—can structure that information. Findings show that ML docs contain substantial, albeit sometimes redundant and inconsistently detailed, RE-relevant content, with Volere providing the most comprehensive coverage when including external factors, while EARS and Rupp capture core functional aspects but struggle with ML-specific context. The work demonstrates a viable pathway to bridge ML documentation and formal RE processes, offering practical guidelines for practitioners and suggesting directions for adapting RE languages to ML-specific content.

Abstract

In software engineering processes for machine learning (ML)-enabled systems, integrating and verifying ML components is a major challenge. A prerequisite is the specification of ML component requirements, including models and data, an area where traditional requirements engineering (RE) processes face new obstacles. An underexplored source of RE-relevant information in this context is ML documentation such as ModelCards and DataSheets. However, it is uncertain to what extent RE-relevant information can be extracted from these documents. This study first investigates the amount and nature of RE-relevant information in 20 publicly available ModelCards and DataSheets. We show that these documents contain a significant amount of potentially RE-relevant information. Next, we evaluate how effectively three established RE representations (EARS, Rupp's template, and Volere) can structure this knowledge into requirements. Our results demonstrate that there is a pathway to transform ML-specific knowledge into structured requirements, incorporating ML documentation in software engineering processes for ML systems.

From Machine Learning Documentation to Requirements: Bridging Processes with Requirements Languages

TL;DR

The paper tackles the challenge of deriving software requirements for ML-enabled systems from informal ML documentation. It analyzes 20 publicly available ModelCards and DataSheets to quantify RE-relevant information and assesses how well three RE representation languages—EARS, Rupp's Template, and Volere—can structure that information. Findings show that ML docs contain substantial, albeit sometimes redundant and inconsistently detailed, RE-relevant content, with Volere providing the most comprehensive coverage when including external factors, while EARS and Rupp capture core functional aspects but struggle with ML-specific context. The work demonstrates a viable pathway to bridge ML documentation and formal RE processes, offering practical guidelines for practitioners and suggesting directions for adapting RE languages to ML-specific content.

Abstract

In software engineering processes for machine learning (ML)-enabled systems, integrating and verifying ML components is a major challenge. A prerequisite is the specification of ML component requirements, including models and data, an area where traditional requirements engineering (RE) processes face new obstacles. An underexplored source of RE-relevant information in this context is ML documentation such as ModelCards and DataSheets. However, it is uncertain to what extent RE-relevant information can be extracted from these documents. This study first investigates the amount and nature of RE-relevant information in 20 publicly available ModelCards and DataSheets. We show that these documents contain a significant amount of potentially RE-relevant information. Next, we evaluate how effectively three established RE representations (EARS, Rupp's template, and Volere) can structure this knowledge into requirements. Our results demonstrate that there is a pathway to transform ML-specific knowledge into structured requirements, incorporating ML documentation in software engineering processes for ML systems.

Paper Structure

This paper contains 20 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: (a) EARS mavin_ears, (b) Rupp's template mazo2020towards and (c) Volere robertson2012mastering.
  • Figure 2: Examples of template excerpts from ModelCards mitchell2019model and DataSheet Gebru2021Datasheets.
  • Figure 3: Example of Volere capturing open issues.