SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning

Dandan Zhao; Karthick Sharma; Hongpeng Yin; Yuxin Qi; Shuhao Zhang

SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning

Dandan Zhao, Karthick Sharma, Hongpeng Yin, Yuxin Qi, Shuhao Zhang

TL;DR

SRTFD tackles real-time fault diagnosis in large-scale industrial systems by integrating online continual learning with three core components: Retrospect Coreset Selection (RCS) for data-efficient updates, Global Balance Technique (GBT) to mitigate class imbalance, and Confidence and Uncertainty-driven Pseudo-label Learning (CUPL) to leverage unlabeled data. The framework maintains a historical data buffer, constructs non-redundant coresets, and uses balanced, uncertainty-aware pseudo-labels to update the model continually, achieving substantial speedups while improving fault-detection performance. Experimental validation on real-world HRS and simulated TEP and CARLA datasets across class-incremental and varying-condition scenarios demonstrates superior recall, precision, F1, and G-mean with significantly reduced training time compared to state-of-the-art baselines. These results indicate SRTFD is a practical, scalable solution for online FD in industrial settings with streaming, imbalanced, and sparsely labeled data.

Abstract

Fault diagnosis (FD) is essential for maintaining operational safety and minimizing economic losses by detecting system abnormalities. Recently, deep learning (DL)-driven FD methods have gained prominence, offering significant improvements in precision and adaptability through the utilization of extensive datasets and advanced DL models. Modern industrial environments, however, demand FD methods that can handle new fault types, dynamic conditions, large-scale data, and provide real-time responses with minimal prior information. Although online continual learning (OCL) demonstrates potential in addressing these requirements by enabling DL models to continuously learn from streaming data, it faces challenges such as data redundancy, imbalance, and limited labeled data. To overcome these limitations, we propose SRTFD, a scalable real-time fault diagnosis framework that enhances OCL with three critical methods: Retrospect Coreset Selection (RCS), which selects the most relevant data to reduce redundant training and improve efficiency; Global Balance Technique (GBT), which ensures balanced coreset selection and robust model performance; and Confidence and Uncertainty-driven Pseudo-label Learning (CUPL), which updates the model using unlabeled data for continuous adaptation. Extensive experiments on a real-world dataset and two public simulated datasets demonstrate SRTFD's effectiveness and potential for providing advanced, scalable, and precise fault diagnosis in modern industrial systems.

SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning

TL;DR

Abstract

Paper Structure (26 sections, 16 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 16 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Background and motivation
Traditional Data-driven-based FD
Online Continuous Learning
Motivation
Problem Statement
Methodology
Retrospect Coreset Selection (RCS)
Global Balance Technique (GBT)
Confidence and Uncertainty-driven Pseudo-label Learning (CUPL)
Experiments
Experimental Setup
Performance Comparison
Ablation Study
Analysis and Discussion
...and 11 more sections

Figures (7)

Figure 1: SRTFD framework conducts fault diagnosis in two stages: Prediction and Training. In the Prediction stage, unlabeled samples are pseudo-labeled and combined with labeled data for model updates via CUPL. In the Training stage, the model is iteratively trained using RSC and GBT to ensure effective learning despite class imbalances.
Figure 2: Comparison of existing Li and our coreset selection approaches. The $\text{Ba}_\text{t}$ and $\text{S}_\text{t}$ denote the batch data and selected coreset in t time, respectively.
Figure 3: Performance of different coreset ratios $cr$ and cluster numbers $uc$ on HRS datasets within class-incremental.
Figure 4: Performance of different thresholds and weights on HRS datasets within class-incremental setting.
Figure 5: Performance of different coreset ratios and cluster numbers $uc$ on all datasets within class-incremental.
...and 2 more figures

SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning

TL;DR

Abstract

SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)