Causality-driven Sequence Segmentation for Enhancing Multiphase Industrial Process Data Analysis and Soft Sensing
Yimeng He, Le Yao, Xinmin Zhang, Xiangyin Kong, Zhihuan Song
TL;DR
Multiphase industrial processes exhibit evolving causal relationships that hinder conventional soft sensing. The authors propose CDSS, which segments sequences by detecting abrupt shifts in causal mechanisms using NTS-NOTEARS-based causal discovery and outputs temporal causal graphs for each phase; phase-specific TC-GCN models are then trained for soft sensing. A similarity-distance metric combining causal and stable components guides phase extension and breakpoint detection. Case studies on stationary and non-stationary numerical data and a penicillin fed-batch process show accurate breakpoints and substantial gains in predictive accuracy with TC-GCN compared to non-segmented or non-causal baselines. The work advances interpretable, causality-aware segmentation and phase-aware soft sensing for industrial big data applications.
Abstract
The dynamic characteristics of multiphase industrial processes present significant challenges in the field of industrial big data modeling. Traditional soft sensing models frequently neglect the process dynamics and have difficulty in capturing transient phenomena like phase transitions. To address this issue, this article introduces a causality-driven sequence segmentation (CDSS) model. This model first identifies the local dynamic properties of the causal relationships between variables, which are also referred to as causal mechanisms. It then segments the sequence into different phases based on the sudden shifts in causal mechanisms that occur during phase transitions. Additionally, a novel metric, similarity distance, is designed to evaluate the temporal consistency of causal mechanisms, which includes both causal similarity distance and stable similarity distance. The discovered causal relationships in each phase are represented as a temporal causal graph (TCG). Furthermore, a soft sensing model called temporal-causal graph convolutional network (TC-GCN) is trained for each phase, by using the time-extended data and the adjacency matrix of TCG. The numerical examples are utilized to validate the proposed CDSS model, and the segmentation results demonstrate that CDSS has excellent performance on segmenting both stable and unstable multiphase series. Especially, it has higher accuracy in separating non-stationary time series compared to other methods. The effectiveness of the proposed CDSS model and the TC-GCN model is also verified through a penicillin fermentation process. Experimental results indicate that the breakpoints discovered by CDSS align well with the reaction mechanisms and TC-GCN significantly has excellent predictive accuracy.
