Table of Contents
Fetching ...

A Neuro-Symbolic Explainer for Rare Events: A Case Study on Predictive Maintenance

João Gama, Rita P. Ribeiro, Saulo Mastelini, Narjes Davarid, Bruno Veloso

TL;DR

Addressing explainability in predictive maintenance, the paper presents a dual-layer online framework that combines an unsupervised LSTM-AE detector with an online regression-rule learner (AMRules) to explain high reconstruction error. The system delivers both global explanations of the detector’s behavior and local explanations for individual alarms, enhanced by a Chebyshev-based oversampling strategy (ChebyOS) to focus on rare failure cases. The Metro do Porto case study demonstrates actionable sensor-level explanations for air and oil leaks and compares rule-based explanations with SHAP, highlighting improved interpretability in critical maintenance contexts. Overall, the work advances practical, real-time interpretability for black-box PdM models and can be extended to other online imbalanced streaming tasks.

Abstract

Predictive Maintenance applications are increasingly complex, with interactions between many components. Black box models are popular approaches based on deep learning techniques due to their predictive accuracy. This paper proposes a neural-symbolic architecture that uses an online rule-learning algorithm to explain when the black box model predicts failures. The proposed system solves two problems in parallel: anomaly detection and explanation of the anomaly. For the first problem, we use an unsupervised state of the art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real-world case study of Metro do Porto and provide explanations that illustrate its benefits.

A Neuro-Symbolic Explainer for Rare Events: A Case Study on Predictive Maintenance

TL;DR

Addressing explainability in predictive maintenance, the paper presents a dual-layer online framework that combines an unsupervised LSTM-AE detector with an online regression-rule learner (AMRules) to explain high reconstruction error. The system delivers both global explanations of the detector’s behavior and local explanations for individual alarms, enhanced by a Chebyshev-based oversampling strategy (ChebyOS) to focus on rare failure cases. The Metro do Porto case study demonstrates actionable sensor-level explanations for air and oil leaks and compares rule-based explanations with SHAP, highlighting improved interpretability in critical maintenance contexts. Overall, the work advances practical, real-time interpretability for black-box PdM models and can be extended to other online imbalanced streaming tasks.

Abstract

Predictive Maintenance applications are increasingly complex, with interactions between many components. Black box models are popular approaches based on deep learning techniques due to their predictive accuracy. This paper proposes a neural-symbolic architecture that uses an online rule-learning algorithm to explain when the black box model predicts failures. The proposed system solves two problems in parallel: anomaly detection and explanation of the anomaly. For the first problem, we use an unsupervised state of the art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real-world case study of Metro do Porto and provide explanations that illustrate its benefits.
Paper Structure (21 sections, 5 equations, 5 figures, 2 tables)

This paper contains 21 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Global Architecture of the Online Anomaly Explanation System. The top panel details the fault detection system, while the bottom details the explanation system. Both systems run online and in parallel.
  • Figure 2: Schema of deep LSTM-AE for anomaly detection
  • Figure 3: A) Box plot for the target variable - reconstruction error; B) Relevance $\phi(.)$ of the target values; C) K-value used in the over-sampling approach.
  • Figure 4: The Air Production Unit system with the position of the main sensors veloso2022metropt.
  • Figure 5: Rules versus Shapley values for the air leak failure.