Outline of an Independent Systematic Blackbox Test for ML-based Systems
Hans-Werner Wiesbrock, Jürgen Großmann
TL;DR
The paper tackles the challenge of validating ML-based systems in a training-independent, statistically sound manner by introducing Probabilistically Extended Ontologies (PEON) that attach probability distributions to partitions of the Operational Design Domain (ODD). It develops a formal testing framework where test outcomes follow a Bernoulli process per partition with end-of-test criteria derived from significance levels and power, and demonstrates the approach with toy and real-data experiments (e.g., COCO/CenterNet object detection and PETA) showing improved representativeness when marginal and conditional distributions are modeled. The work highlights the limitations of purely combinatorial (N-wise) testing for ML systems and offers a concrete data-generation pipeline that transitions abstract PEONs into executable test cases via simulation. Overall, PEON provides a principled path toward reproducible, statistically valid black-box testing and potential certification of ML-based systems, along with planned enhancements to data generation, ethical assessment, and sample-size estimation ideas.
Abstract
This article proposes a test procedure that can be used to test ML models and ML-based systems independently of the actual training process. In this way, the typical quality statements such as accuracy and precision of these models and system can be verified independently, taking into account their black box character and the immanent stochastic properties of ML models and their training data. The article presents first results from a set of test experiments and suggest extensions to existing test methods reflecting the stochastic nature of ML models and ML-based systems.
