Table of Contents
Fetching ...

Combining SHAP and Causal Analysis for Interpretable Fault Detection in Industrial Processes

Pedro Cortes dos Santos, Matheus Becali Rocha, Renato A Krohling

TL;DR

This work tackles fault detection in complex industrial processes by combining SHAP-based feature selection with causal analysis using Directed Acyclic Graphs, applied to the Tennessee Eastman Process. The approach reduces dimensionality from 52 to 10 variables without sacrificing accuracy, while simultaneously revealing causal pathways for fault propagation. Five DAG algorithms (PC, FCI, RFCI, LINGAM, NOTEARS) converge on robust fault mechanisms, notably cooling systems and stripper operations, yielding interpretable, actionable insights for operators. The framework enhances detection performance and interpretability, offering a practical tool for smarter, more transparent fault monitoring in Industry 4.0 contexts.

Abstract

Industrial processes generate complex data that challenge fault detection systems, often yielding opaque or underwhelming results despite advanced machine learning techniques. This study tackles such difficulties using the Tennessee Eastman Process, a well-established benchmark known for its intricate dynamics, to develop an innovative fault detection framework. Initial attempts with standard models revealed limitations in both performance and interpretability, prompting a shift toward a more tractable approach. By employing SHAP (SHapley Additive exPlanations), we transform the problem into a more manageable and transparent form, pinpointing the most critical process features driving fault predictions. This reduction in complexity unlocks the ability to apply causal analysis through Directed Acyclic Graphs, generated by multiple algorithms, to uncover the underlying mechanisms of fault propagation. The resulting causal structures align strikingly with SHAP findings, consistently highlighting key process elements-like cooling and separation systems-as pivotal to fault development. Together, these methods not only enhance detection accuracy but also provide operators with clear, actionable insights into fault origins, a synergy that, to our knowledge, has not been previously explored in this context. This dual approach bridges predictive power with causal understanding, offering a robust tool for monitoring complex manufacturing environments and paving the way for smarter, more interpretable fault detection in industrial systems.

Combining SHAP and Causal Analysis for Interpretable Fault Detection in Industrial Processes

TL;DR

This work tackles fault detection in complex industrial processes by combining SHAP-based feature selection with causal analysis using Directed Acyclic Graphs, applied to the Tennessee Eastman Process. The approach reduces dimensionality from 52 to 10 variables without sacrificing accuracy, while simultaneously revealing causal pathways for fault propagation. Five DAG algorithms (PC, FCI, RFCI, LINGAM, NOTEARS) converge on robust fault mechanisms, notably cooling systems and stripper operations, yielding interpretable, actionable insights for operators. The framework enhances detection performance and interpretability, offering a practical tool for smarter, more transparent fault monitoring in Industry 4.0 contexts.

Abstract

Industrial processes generate complex data that challenge fault detection systems, often yielding opaque or underwhelming results despite advanced machine learning techniques. This study tackles such difficulties using the Tennessee Eastman Process, a well-established benchmark known for its intricate dynamics, to develop an innovative fault detection framework. Initial attempts with standard models revealed limitations in both performance and interpretability, prompting a shift toward a more tractable approach. By employing SHAP (SHapley Additive exPlanations), we transform the problem into a more manageable and transparent form, pinpointing the most critical process features driving fault predictions. This reduction in complexity unlocks the ability to apply causal analysis through Directed Acyclic Graphs, generated by multiple algorithms, to uncover the underlying mechanisms of fault propagation. The resulting causal structures align strikingly with SHAP findings, consistently highlighting key process elements-like cooling and separation systems-as pivotal to fault development. Together, these methods not only enhance detection accuracy but also provide operators with clear, actionable insights into fault origins, a synergy that, to our knowledge, has not been previously explored in this context. This dual approach bridges predictive power with causal understanding, offering a robust tool for monitoring complex manufacturing environments and paving the way for smarter, more interpretable fault detection in industrial systems.

Paper Structure

This paper contains 31 sections, 6 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Tennessee Eastman Process (TEP) diagram li2011.
  • Figure 2: Simplified process flow diagram of TEP.
  • Figure 3: Framework integrating SHAP-based feature selection with causal analysis for industrial fault detection. The workflow shows how dimensionality reduction to just 10 variables maintains model performance while enabling interpretable causal analysis through multiple DAG algorithms.
  • Figure 4: SHAP values for the top 15 variables in the TEP dataset. Variables such as XMV.11 (Condenser cooling water flow) and XMEAS.17 (Stripper underflow) have the highest impact on model predictions. Higher SHAP values indicate greater importance for fault detection.
  • Figure 5: Complete 52-variable DAG of the TEP dataset (RFCI algorithm) demonstrating the inherent complexity of industrial process monitoring.
  • ...and 5 more figures