Table of Contents
Fetching ...

RULSurv: A probabilistic survival-based method for early censoring-aware prediction of remaining useful life in ball bearings

Christian Marius Lillelund, Fernando Pannullo, Morten Opprud Jakobsen, Manuel Morante, Christian Fischer Pedersen

TL;DR

RULSurv tackles censoring in bearing RUL estimation by coupling a KL-divergence–based event detector with survival-analysis models that naturally handle right-censoring. The method labels the onset of degradation via spectral-distance changes across bearing-frequency bands and then learns individual survival distributions $S(t|oldsymbol{x})$ using five models, including CoxPH, GBSA, RSF, MTLR, and BNNSurv, ensuring monotonic RUL predictions via the survival curve. Bayesian approaches (e.g., BNNSurv with Monte Carlo Dropout) provide credible intervals for uncertainty, while cross-validated experiments on the XJTU-SY dataset show competitive MAEs and state-of-the-art CRA under high load. The work demonstrates that incorporating censored data improves predictive accuracy and enables time-to-failure predictions with probabilistic interpretation, offering actionable insight for predictive maintenance.

Abstract

Predicting the remaining useful life (RUL) of ball bearings is an active area of research, where novel machine learning techniques are continuously being applied to predict degradation trends and anticipate failures before they occur. However, few studies have explicitly addressed the challenge of handling censored data, where information about a specific event (\eg mechanical failure) is incomplete or only partially observed. To address this issue, we introduce a novel and flexible method for early fault detection using Kullback-Leibler (KL) divergence and RUL estimation using survival analysis that naturally supports censored data. We demonstrate our approach in the XJTU-SY dataset using a 5-fold cross-validation strategy across three different operating conditions. When predicting the time to failure for bearings under the highest load (C1, 12.0 kN and 2100 RPM) with 25% random censoring, our approach achieves a mean absolute error (MAE) of 14.7 minutes (95% CI = 13.6-15.8) using a linear CoxPH model, and an MAE of 12.6 minutes (95% CI = 11.8-13.4) using a nonlinear Random Survival Forests model, compared to an MAE of 18.5 minutes (95% CI = 17.4-19.6) using a linear LASSO model that does not support censoring. Moreover, our approach achieves a mean cumulative relative accuracy (CRA) of 0.7586 over 5 bearings under the highest load, which improves over several state-of-the-art baselines. Our work highlights the importance of considering censored data as part of the model design when building predictive models for early fault detection and RUL estimation.

RULSurv: A probabilistic survival-based method for early censoring-aware prediction of remaining useful life in ball bearings

TL;DR

RULSurv tackles censoring in bearing RUL estimation by coupling a KL-divergence–based event detector with survival-analysis models that naturally handle right-censoring. The method labels the onset of degradation via spectral-distance changes across bearing-frequency bands and then learns individual survival distributions using five models, including CoxPH, GBSA, RSF, MTLR, and BNNSurv, ensuring monotonic RUL predictions via the survival curve. Bayesian approaches (e.g., BNNSurv with Monte Carlo Dropout) provide credible intervals for uncertainty, while cross-validated experiments on the XJTU-SY dataset show competitive MAEs and state-of-the-art CRA under high load. The work demonstrates that incorporating censored data improves predictive accuracy and enables time-to-failure predictions with probabilistic interpretation, offering actionable insight for predictive maintenance.

Abstract

Predicting the remaining useful life (RUL) of ball bearings is an active area of research, where novel machine learning techniques are continuously being applied to predict degradation trends and anticipate failures before they occur. However, few studies have explicitly addressed the challenge of handling censored data, where information about a specific event (\eg mechanical failure) is incomplete or only partially observed. To address this issue, we introduce a novel and flexible method for early fault detection using Kullback-Leibler (KL) divergence and RUL estimation using survival analysis that naturally supports censored data. We demonstrate our approach in the XJTU-SY dataset using a 5-fold cross-validation strategy across three different operating conditions. When predicting the time to failure for bearings under the highest load (C1, 12.0 kN and 2100 RPM) with 25% random censoring, our approach achieves a mean absolute error (MAE) of 14.7 minutes (95% CI = 13.6-15.8) using a linear CoxPH model, and an MAE of 12.6 minutes (95% CI = 11.8-13.4) using a nonlinear Random Survival Forests model, compared to an MAE of 18.5 minutes (95% CI = 17.4-19.6) using a linear LASSO model that does not support censoring. Moreover, our approach achieves a mean cumulative relative accuracy (CRA) of 0.7586 over 5 bearings under the highest load, which improves over several state-of-the-art baselines. Our work highlights the importance of considering censored data as part of the model design when building predictive models for early fault detection and RUL estimation.
Paper Structure (33 sections, 16 equations, 8 figures, 10 tables, 2 algorithms)

This paper contains 33 sections, 16 equations, 8 figures, 10 tables, 2 algorithms.

Figures (8)

  • Figure 1: Evolution of the frequency spectrum across the stages of a typical bearing failure. Initially, a healthy bearing exhibits frequencies solely linked to shaft phenomena, such as balance or misalignment. Stage 1 introduces ultrasonic frequencies detectable only by specialized sensors, without visible defects on the bearing. Stage 2 is marked by signals aligning with the bearing parts' natural resonance, alongside the initial appearance of defects upon inspection. Stage 3 features the fundamental defect frequencies and their harmonics, modulated by the shaft speed, indicating a spread of defects. Stage 4 is the final stage and precedes complete failure. This final stage is characterized by a mix of modulated fundamental frequencies, harmonics, and, ultimately, a shift towards a random noise floor.
  • Figure 2: Outline of the proposed method. Historical data for 5 ball bearings form a bearing dataset $\mathcal{D}$ (see Section \ref{['sec:the_dataset']}). We first perform feature extraction of all the bearings in the time-domain (see Section \ref{['sec:feature_extraction']}). Then, to train a survival model for RUL prediction, we designate training and test bearings and perform event detection in the frequency-domain of the training bearings (see Section \ref{['sec:event_detection']}). Afterwards, we convert the temporal training dataset to a supervised learning dataset by computing a rolling average (see Section \ref{['sec:data_preprocessing']}. We then apply random, independent censoring using the proposed censoring algorithm (see Section \ref{['sec:data_censoring']}), so that the event of interest is only observed for a portion of the instances. This processed dataset $\tilde{\mathcal{D}}$ thus contains $N$ instances (rows) with several time-domain features ( e.g., entropy, kurtosis), and the time-to-event (filled dot) or censoring (hollow dot). We use this dataset (features and event information) to train a survival model $\mathcal{M}$ (see Section \ref{['sec:survival_models']}) that can estimate the individual survival distribution (ISD) of the failure event, for a new test bearing $\bm{x}_{i}$, denoted as $\hat{s}_{i}$. This ISD give the probability that failure ( i.e., the onset of degradation) occurs after $t$ minutes post startup, for all $t>0$. It can also be used to estimate the time-to-event for this $\bm{x}_{i}$, for example, when the survival (event) curve intersects the dashed horizontal line at 50%, which is referred to as the predicted median survival time.
  • Figure 3: Visualization of the normalized $|\Delta_{KL}|$ and threshold function for a single critical band for the proposed event detection algorithm. This illustration depicts the segmentation of the signal into sequential windows ($w$), followed by the estimation of the KL divergence based on the estimated probability between consecutive windows. The diagram showcases some examples of the obtained probability density functions, and highlights the instance when the divergence exceeds the threshold, thereby indicating a probably significant malfunctioning of the bearing's performance.
  • Figure 4: Predicted survival probability $S(t)$ using the KM estimator under various amounts of censoring. The shaded area around the curves represent empirical 95% confidence intervals, computed using the Greenwood formula sawyer2003greenwood. We see that the confidence interval increases proportionally with the level of censoring, thus indicating more uncertainty in the survival probability.
  • Figure 5: Event distribution of censored and uncensored event times. As censoring increases, the number of observed events decreases.
  • ...and 3 more figures