Table of Contents
Fetching ...

Machine Learning with Requirements: a Manifesto

Eleonora Giunchiglia, Fergus Imrie, Mihaela van der Schaar, Thomas Lukasiewicz

TL;DR

The paper tackles the problem of deploying ML in high-stakes domains by arguing that explicit requirements definition and verification are essential to curb unsafe model behavior. It introduces a requirements-driven pyramid development pipeline that tightly couples requirements with all stages of data curation, model construction, training, testing, and deployment, allowing requirements to evolve with discovery during development. Through healthcare and autonomous driving exemplars, the authors show that high metric performance can mask violations of critical requirements, motivating a shift toward integrating logical constraints and domain knowledge, including neuro-symbolic methods, into the ML lifecycle. The work advocates adopting software-engineering practices—such as documentation, checklists, and formal verification—to achieve safer, more certifiable AI systems and outlines future research directions for operationalizing this paradigm in practice.

Abstract

In the recent years, machine learning has made great advancements that have been at the root of many breakthroughs in different application domains. However, it is still an open issue how make them applicable to high-stakes or safety-critical application domains, as they can often be brittle and unreliable. In this paper, we argue that requirements definition and satisfaction can go a long way to make machine learning models even more fitting to the real world, especially in critical domains. To this end, we present two problems in which (i) requirements arise naturally, (ii) machine learning models are or can be fruitfully deployed, and (iii) neglecting the requirements can have dramatic consequences. We show how the requirements specification can be fruitfully integrated into the standard machine learning development pipeline, proposing a novel pyramid development process in which requirements definition may impact all the subsequent phases in the pipeline, and viceversa.

Machine Learning with Requirements: a Manifesto

TL;DR

The paper tackles the problem of deploying ML in high-stakes domains by arguing that explicit requirements definition and verification are essential to curb unsafe model behavior. It introduces a requirements-driven pyramid development pipeline that tightly couples requirements with all stages of data curation, model construction, training, testing, and deployment, allowing requirements to evolve with discovery during development. Through healthcare and autonomous driving exemplars, the authors show that high metric performance can mask violations of critical requirements, motivating a shift toward integrating logical constraints and domain knowledge, including neuro-symbolic methods, into the ML lifecycle. The work advocates adopting software-engineering practices—such as documentation, checklists, and formal verification—to achieve safer, more certifiable AI systems and outlines future research directions for operationalizing this paradigm in practice.

Abstract

In the recent years, machine learning has made great advancements that have been at the root of many breakthroughs in different application domains. However, it is still an open issue how make them applicable to high-stakes or safety-critical application domains, as they can often be brittle and unreliable. In this paper, we argue that requirements definition and satisfaction can go a long way to make machine learning models even more fitting to the real world, especially in critical domains. To this end, we present two problems in which (i) requirements arise naturally, (ii) machine learning models are or can be fruitfully deployed, and (iii) neglecting the requirements can have dramatic consequences. We show how the requirements specification can be fruitfully integrated into the standard machine learning development pipeline, proposing a novel pyramid development process in which requirements definition may impact all the subsequent phases in the pipeline, and viceversa.
Paper Structure (6 sections, 4 figures)

This paper contains 6 sections, 4 figures.

Figures (4)

  • Figure 1: Visualization of the standard machine learning pipeline.
  • Figure 2: Figure \ref{['fig:autonomous_driving_ok']} and \ref{['fig:autonomous_driving_viol']} show the predictions made by the I3D model (with $\theta=0.5$) for the same traffic light and just one frame apart. Figure \ref{['fig:num_violations']} shows the number of predictions that violate at least one requirements when varying $\theta$. Figure \ref{['fig:num_violations']} is from giunchiglia2022_road.
  • Figure 3: Figure \ref{['fig:healthcare_trajectory']} shows two examples of the predictions made by different classification models for the same patient at different time horizons. Figure \ref{['fig:healthcare_viols']} shows the proportion of samples for which the predictions violate the requirement for risk to be increasing for longer time horizons.
  • Figure 4: Visualization of the pyramid model. The full arrows stand for the standard procedural processes, while the dotted arrows show that the requirements impact every stage of the process.