Table of Contents
Fetching ...

Interpretable Rules for Online Failure Prediction: A Case Study on the Metro do Porto dataset

Matthias Jakobs, Bruno Veloso, Joao Gama

TL;DR

This work tackles interpretable online failure prediction for Metro do Porto trains by coupling a Convolutional Autoencoder-based failure detector with an online rule-learning pipeline that derives both local and global, easily interpretable rules. Time-windowed sensor data are transformed into concise features (e.g., variances, min/max/mean) and used to train decision trees that explain detected failures, with a strong emphasis on a binary failure probability $p_t(\text{failure})$ and smoothing via $\alpha$ and $\tau_{\text{fail}}$. The key finding is that a single sensor, Flowmeter, is highly predictive for both failures, yielding simple thresholds that trigger alarms well before the LPS signal, and alternative cheaper sensors can still produce valid explanations when Flowmeter is unavailable. The study demonstrates that MetroPT2 is not extremely challenging when Flowmeter is present, but highlights the need to test on MetroPT3 for more robust, real-world applicability, and discusses limitations such as infinite possible rules and unbounded historical data.

Abstract

Due to their high predictive performance, predictive maintenance applications have increasingly been approached with Deep Learning techniques in recent years. However, as in other real-world application scenarios, the need for explainability is often stated but not sufficiently addressed. This study will focus on predicting failures on Metro trains in Porto, Portugal. While recent works have found high-performing deep neural network architectures that feature a parallel explainability pipeline, the generated explanations are fairly complicated and need help explaining why the failures are happening. This work proposes a simple online rule-based explainability approach with interpretable features that leads to straightforward, interpretable rules. We showcase our approach on MetroPT2 and find that three specific sensors on the Metro do Porto trains suffice to predict the failures present in the dataset with simple rules.

Interpretable Rules for Online Failure Prediction: A Case Study on the Metro do Porto dataset

TL;DR

This work tackles interpretable online failure prediction for Metro do Porto trains by coupling a Convolutional Autoencoder-based failure detector with an online rule-learning pipeline that derives both local and global, easily interpretable rules. Time-windowed sensor data are transformed into concise features (e.g., variances, min/max/mean) and used to train decision trees that explain detected failures, with a strong emphasis on a binary failure probability and smoothing via and . The key finding is that a single sensor, Flowmeter, is highly predictive for both failures, yielding simple thresholds that trigger alarms well before the LPS signal, and alternative cheaper sensors can still produce valid explanations when Flowmeter is unavailable. The study demonstrates that MetroPT2 is not extremely challenging when Flowmeter is present, but highlights the need to test on MetroPT3 for more robust, real-world applicability, and discusses limitations such as infinite possible rules and unbounded historical data.

Abstract

Due to their high predictive performance, predictive maintenance applications have increasingly been approached with Deep Learning techniques in recent years. However, as in other real-world application scenarios, the need for explainability is often stated but not sufficiently addressed. This study will focus on predicting failures on Metro trains in Porto, Portugal. While recent works have found high-performing deep neural network architectures that feature a parallel explainability pipeline, the generated explanations are fairly complicated and need help explaining why the failures are happening. This work proposes a simple online rule-based explainability approach with interpretable features that leads to straightforward, interpretable rules. We showcase our approach on MetroPT2 and find that three specific sensors on the Metro do Porto trains suffice to predict the failures present in the dataset with simple rules.

Paper Structure

This paper contains 12 sections, 4 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The autoencoder used in this work consists of $N$ encoder and decoder blocks (left) with identical block structures (right).
  • Figure 2: Visualization of our online rule-learning approach. During the green section, we add non-anomalous data to our history. During the blue section, we add the data to the buffer instead and learn fitting rules. Once $p_{}(\textrm{failure})$ stops monotonically increasing, we reset and return the learned rules, clear the buffer of anomalous data and add the datapoints to the history again.
  • Figure 3: Our models predicted $p_{}(\textrm{failure})$ (in blue) over the testing period of MetroPT2 (top row) and for both failures in more detail (bottom row). The actual failure periods are shown in grey, as well as the activation of the LPS signal as the dotted black lines. The threshold $\tau_{\textrm{fail}}$ at which $p_{}(\textrm{failure})$ is reported as a failure is set to $\tau_{\textrm{fail}} = 0.5$ and shown in red.
  • Figure 4: The Flowmeter sensor during the entirety of the test data range. Notice that, most of the time, the values of Flowmeter are minimal and only increase during the annotated failures. For a more detailed view of both failures, see Fig. \ref{['fig:flowmeter_combined']}.
  • Figure 5: The Flowmeter sensor during the Air and Oil leaks. Flowmeter values are shown in green while the $p_{}(\textrm{failure})$ is shown in blue. A clear correlation between failures and high values of Flowmeter makes this sensor highly predictive of these failures.
  • ...and 1 more figures