Table of Contents
Fetching ...

A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations

Waleed Razzaq, Yun-Bo Zhao

TL;DR

<3-5 sentence high-level summary>

Abstract

Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machinery failure, highlighting the need for robust RUL estimation methods. Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability. This paper proposes a novel multimodal-RUL framework that jointly leverages image representations (ImR) and time-frequency representations (TFR) of multichannel, nonstationary vibration signals. The architecture comprises three branches: (1) an ImR branch and (2) a TFR branch, both employing multiple dilated convolutional blocks with residual connections to extract spatial degradation features; and (3) a fusion branch that concatenates these features and feeds them into an LSTM to model temporal degradation patterns. A multi-head attention mechanism subsequently emphasizes salient features, followed by linear layers for final RUL regression. To enable effective multimodal learning, vibration signals are converted into ImR via the Bresenham line algorithm and into TFR using Continuous Wavelet Transform. We also introduce multimodal Layer-wise Relevance Propagation (multimodal-LRP), a tailored explainability technique that significantly enhances model transparency. The approach is validated on the XJTU-SY and PRONOSTIA benchmark datasets. Results show that our method matches or surpasses state-of-the-art baselines under both seen and unseen operating conditions, while requiring ~28 % less training data on XJTU-SY and ~48 % less on PRONOSTIA. The model exhibits strong noise resilience, and multimodal-LRP visualizations confirm the interpretability and trustworthiness of predictions, making the framework highly suitable for real-world industrial deployment.

A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations

TL;DR

<3-5 sentence high-level summary>

Abstract

Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machinery failure, highlighting the need for robust RUL estimation methods. Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability. This paper proposes a novel multimodal-RUL framework that jointly leverages image representations (ImR) and time-frequency representations (TFR) of multichannel, nonstationary vibration signals. The architecture comprises three branches: (1) an ImR branch and (2) a TFR branch, both employing multiple dilated convolutional blocks with residual connections to extract spatial degradation features; and (3) a fusion branch that concatenates these features and feeds them into an LSTM to model temporal degradation patterns. A multi-head attention mechanism subsequently emphasizes salient features, followed by linear layers for final RUL regression. To enable effective multimodal learning, vibration signals are converted into ImR via the Bresenham line algorithm and into TFR using Continuous Wavelet Transform. We also introduce multimodal Layer-wise Relevance Propagation (multimodal-LRP), a tailored explainability technique that significantly enhances model transparency. The approach is validated on the XJTU-SY and PRONOSTIA benchmark datasets. Results show that our method matches or surpasses state-of-the-art baselines under both seen and unseen operating conditions, while requiring ~28 % less training data on XJTU-SY and ~48 % less on PRONOSTIA. The model exhibits strong noise resilience, and multimodal-LRP visualizations confirm the interpretability and trustworthiness of predictions, making the framework highly suitable for real-world industrial deployment.

Paper Structure

This paper contains 43 sections, 36 equations, 10 figures, 6 tables, 3 algorithms.

Figures (10)

  • Figure 1: Illustration of Bresenham's line algorithm.
  • Figure 2: Visual representation of the proposed multimodal RUL framework. (a) Horizontal and vertical vibrational signals are rectangularized and processed using Algorithm \ref{['algo:rast']} and Algorithm \ref{['algo:tf']} to generate the ImR and TFR, respectively. These are then fed into the Multimodal Sequential Data Generator to apply image processing techniques and generate RUL labels. (b) A multimodal AI model with parallel Image and TF branches processes the data, and their outputs are fused in fusion branch to estimate RUL. (b) The estimated RUL is propagated backward using Algorithm \ref{['algo:lrp']} for Multimodal-LRP explanations.
  • Figure 3: a) XJTU-SY testbed; b) PRONOSTIA testbed for recording vibrational data
  • Figure 4: RUL labels for both datasets.
  • Figure 5: Ablation experiment results for a) XJTU-SY; b) PRONOSTIA.
  • ...and 5 more figures