Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems
Milapji Singh Gill, Tom Westermann, Marvin Schieseck, Alexander Fay
TL;DR
This paper tackles the data-understanding and data-preparation bottlenecks in CPPS analytics by integrating domain expert-centric ontology design into the CRISP-DM workflow. It presents a structured extension that inserts ontology design steps between business understanding and data understanding, using lightweight and heavyweight ontology artifacts and defined roles to build modular, reusable knowledge representations. The approach is demonstrated with an anomaly detection use case in a hybrid mixing plant, where ontology-driven data access and a learned timed automaton enable efficient data exploration and preparation. The findings suggest meaningful efficiency gains and reusable artifacts, while highlighting the need for broader validation across more CPPS scenarios and opportunities to automate components of the workflow.
Abstract
In the age of Industry 4.0 and Cyber-Physical Production Systems (CPPSs) vast amounts of potentially valuable data are being generated. Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected. The knowledge obtained can in turn be used to improve tasks like diagnostics or maintenance planning. However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISP-DM), often fail due to the disproportionate amount of time needed for understanding and preparing the data. The application of domain-specific ontologies has demonstrated its advantageousness in a wide variety of Industry 4.0 application scenarios regarding the aforementioned challenges. However, workflows and artifacts from ontology design for CPPSs have not yet been systematically integrated into the CRISP-DM. Accordingly, this contribution intends to present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS. The result is exemplarily applied to an anomaly detection use case.
