InfoPos: A Design Support Framework for ML-Assisted Fault Detection and Identification in Industrial Cyber-Physical Systems
Uraz Odyurt, Richard Loendersloot, Tiedo Tinga
TL;DR
The paper tackles the design-time complexity of ML-assisted fault detection in industrial CPS by introducing InfoPos, a framework that maps available system knowledge and data to an information-position matrix to guide selection of preprocessing, data segmentation, and ML components. It presents a concrete methodology using a real demonstrator, including data processing to produce regression-based signatures, diverse segmentation strategies, and data-degradation simulations to emulate varying data richness, evaluated with multiple tree-based classifiers. Key findings show that the electrical current metric is highly informative, mid data cuts with quadratic signatures often yield top performance, and BD T-like tree methods are robust choices, achieving up to 99.54% accuracy in some configurations. The framework aims to reduce design search and optimization effort for ML-driven FDI in industrial CPS, and the authors provide publicly available datasets and code to support reproducibility and further exploration, with planned extensions to richer data, deep learning models, and production-scale validation.
Abstract
The variety of building blocks and algorithms incorporated in data-centric and ML-assisted fault detection and identification solutions is high, contributing to two challenges: selection of the most effective set and order of building blocks, as well as achieving such a selection with minimum cost. Considering that ML-assisted solution design is influenced by the extent of available data and the extent of available knowledge of the target system, it is advantageous to be able to select effective and matching building blocks. We introduce the first iteration of our InfoPos framework, allowing the placement of fault detection/identification use-cases based on the available levels (positions), i.e., from poor to rich, of knowledge and data dimensions. With that input, designers and developers can reveal the most effective corresponding choice(s), streamlining the solution design process. The results from a demonstrator, a fault identification use-case for industrial Cyber-Physical Systems, reflects achieved effects when different building blocks are used throughout knowledge and data positions. The achieved ML model performance is considered as the indicator for a better solution. The data processing code and composed datasets are publicly available.
