Table of Contents
Fetching ...

UniFault: A Fault Diagnosis Foundation Model from Bearing Data

Emadeldeen Eldele, Mohamed Ragab, Xu Qing, Edward, Zhenghua Chen, Min Wu, Xiaoli Li, Jay Lee

TL;DR

UniFault addresses cross-dataset heterogeneity in bearing fault diagnosis by building a foundation model pretrained on over $6.9$ million samples and employing a universal preprocessing pipeline with channel unification and cross-domain temporal fusion. The model uses a Transformer backbone trained with contrastive self-supervised learning, enabling strong few-shot performance across diverse machines and operating conditions. Key contributions include the large-scale heterogeneous pretraining, the data harmonization and fusion strategies, and demonstrated state-of-the-art few-shot results on real-world bearing datasets. This work advances predictive maintenance by delivering a scalable, robust, and adaptable fault-diagnosis foundation model suitable for deployment in varied industrial settings and real-time monitoring scenarios.

Abstract

Machine fault diagnosis (FD) is a critical task for predictive maintenance, enabling early fault detection and preventing unexpected failures. Despite its importance, existing FD models are operation-specific with limited generalization across diverse datasets. Foundation models (FM) have demonstrated remarkable potential in both visual and language domains, achieving impressive generalization capabilities even with minimal data through few-shot or zero-shot learning. However, translating these advances to FD presents unique hurdles. Unlike the large-scale, cohesive datasets available for images and text, FD datasets are typically smaller and more heterogeneous, with significant variations in sampling frequencies and the number of channels across different systems and applications. This heterogeneity complicates the design of a universal architecture capable of effectively processing such diverse data while maintaining robust feature extraction and learning capabilities. In this paper, we introduce UniFault, a foundation model for fault diagnosis that systematically addresses these issues. Specifically, the model incorporates a comprehensive data harmonization pipeline featuring two key innovations. First, a unification scheme transforms multivariate inputs into standardized univariate sequences. Second, a novel cross-domain temporal fusion strategy mitigates distribution shifts and enriches sample diversity and count, improving the model generalization across varying conditions. UniFault is pretrained on over 6.9 million samples spanning diverse FD datasets, enabling superior few-shot performance. Extensive experiments on real-world FD datasets demonstrate that UniFault achieves state-of-the-art performance, setting a new benchmark for fault diagnosis models and paving the way for more scalable and robust predictive maintenance solutions.

UniFault: A Fault Diagnosis Foundation Model from Bearing Data

TL;DR

UniFault addresses cross-dataset heterogeneity in bearing fault diagnosis by building a foundation model pretrained on over million samples and employing a universal preprocessing pipeline with channel unification and cross-domain temporal fusion. The model uses a Transformer backbone trained with contrastive self-supervised learning, enabling strong few-shot performance across diverse machines and operating conditions. Key contributions include the large-scale heterogeneous pretraining, the data harmonization and fusion strategies, and demonstrated state-of-the-art few-shot results on real-world bearing datasets. This work advances predictive maintenance by delivering a scalable, robust, and adaptable fault-diagnosis foundation model suitable for deployment in varied industrial settings and real-time monitoring scenarios.

Abstract

Machine fault diagnosis (FD) is a critical task for predictive maintenance, enabling early fault detection and preventing unexpected failures. Despite its importance, existing FD models are operation-specific with limited generalization across diverse datasets. Foundation models (FM) have demonstrated remarkable potential in both visual and language domains, achieving impressive generalization capabilities even with minimal data through few-shot or zero-shot learning. However, translating these advances to FD presents unique hurdles. Unlike the large-scale, cohesive datasets available for images and text, FD datasets are typically smaller and more heterogeneous, with significant variations in sampling frequencies and the number of channels across different systems and applications. This heterogeneity complicates the design of a universal architecture capable of effectively processing such diverse data while maintaining robust feature extraction and learning capabilities. In this paper, we introduce UniFault, a foundation model for fault diagnosis that systematically addresses these issues. Specifically, the model incorporates a comprehensive data harmonization pipeline featuring two key innovations. First, a unification scheme transforms multivariate inputs into standardized univariate sequences. Second, a novel cross-domain temporal fusion strategy mitigates distribution shifts and enriches sample diversity and count, improving the model generalization across varying conditions. UniFault is pretrained on over 6.9 million samples spanning diverse FD datasets, enabling superior few-shot performance. Extensive experiments on real-world FD datasets demonstrate that UniFault achieves state-of-the-art performance, setting a new benchmark for fault diagnosis models and paving the way for more scalable and robust predictive maintenance solutions.

Paper Structure

This paper contains 36 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The overall design of UniFault. (1) We collect datasets from multiple heterogeneous sources with different sequence lengths, sampling rates, and channel counts. (2) Our preprocessing pipeline includes data normalization, sequence length standardization, unifying the channel dimension, and generating new samples via cross-domain temporal fusion. (3) We perform contrastive-based self-supervised pretraining to our Transformer-based backbone. (4) The pretrained model can be fine-tuned with few-shot samples.
  • Figure 2: Effect of Cross-dataset Temporal Fusion on IMS and PU datasets for both Unifault models.
  • Figure 3: Effect of model depth on the Base model performance across IMS, PU, and M01 datasets (mean $\pm$ stdev over three seeds).
  • Figure 4: t-SNE of penultimate-layer features for CWRU and MFPT using Unifault-Base before vs. after UniFault. Each point is a sample; color denotes class. UniFault tightens intra-class clusters and increases inter-class margins across both datasets.
  • Figure 5: K-shot experiment on IMS with Unifault-Lite and Unifault-Base (mean $\pm$ std over three seeds).