UniFault: A Fault Diagnosis Foundation Model from Bearing Data
Emadeldeen Eldele, Mohamed Ragab, Xu Qing, Edward, Zhenghua Chen, Min Wu, Xiaoli Li, Jay Lee
TL;DR
UniFault addresses cross-dataset heterogeneity in bearing fault diagnosis by building a foundation model pretrained on over $6.9$ million samples and employing a universal preprocessing pipeline with channel unification and cross-domain temporal fusion. The model uses a Transformer backbone trained with contrastive self-supervised learning, enabling strong few-shot performance across diverse machines and operating conditions. Key contributions include the large-scale heterogeneous pretraining, the data harmonization and fusion strategies, and demonstrated state-of-the-art few-shot results on real-world bearing datasets. This work advances predictive maintenance by delivering a scalable, robust, and adaptable fault-diagnosis foundation model suitable for deployment in varied industrial settings and real-time monitoring scenarios.
Abstract
Machine fault diagnosis (FD) is a critical task for predictive maintenance, enabling early fault detection and preventing unexpected failures. Despite its importance, existing FD models are operation-specific with limited generalization across diverse datasets. Foundation models (FM) have demonstrated remarkable potential in both visual and language domains, achieving impressive generalization capabilities even with minimal data through few-shot or zero-shot learning. However, translating these advances to FD presents unique hurdles. Unlike the large-scale, cohesive datasets available for images and text, FD datasets are typically smaller and more heterogeneous, with significant variations in sampling frequencies and the number of channels across different systems and applications. This heterogeneity complicates the design of a universal architecture capable of effectively processing such diverse data while maintaining robust feature extraction and learning capabilities. In this paper, we introduce UniFault, a foundation model for fault diagnosis that systematically addresses these issues. Specifically, the model incorporates a comprehensive data harmonization pipeline featuring two key innovations. First, a unification scheme transforms multivariate inputs into standardized univariate sequences. Second, a novel cross-domain temporal fusion strategy mitigates distribution shifts and enriches sample diversity and count, improving the model generalization across varying conditions. UniFault is pretrained on over 6.9 million samples spanning diverse FD datasets, enabling superior few-shot performance. Extensive experiments on real-world FD datasets demonstrate that UniFault achieves state-of-the-art performance, setting a new benchmark for fault diagnosis models and paving the way for more scalable and robust predictive maintenance solutions.
