Multi-scale Quaternion CNN and BiGRU with Cross Self-attention Feature Fusion for Fault Diagnosis of Bearing
Huanbai Liu, Fanlong Zhang, Yin Tan, Lian Huang, Yan Li, Guoheng Huang, Shenghong Luo, An Zeng
TL;DR
This work targets robust bearing fault diagnosis under noise and domain shifts by introducing MQCCAF, a lightweight end-to-end model that integrates a multi-scale quaternion CNN (MQCNN) with a cross self-attention feature fusion (CSAFF) and a BiGRU classifier. MQCNN extracts global and multi-scale quaternion features from raw vibration signals, while CSAFF selectively fuses these features across scales to reduce redundancy and emphasize discriminative regions. The BiGRU-based classifier captures temporal dependencies, yielding state-of-the-art accuracy on CWRU (up to 99.99%), MFPT (100%), and Ottawa (99.21%), with strong anti-noise performance and cross-domain transfer capabilities. The approach offers a practical, robust, and efficient solution for real-time fault diagnosis in varied loading and noisy industrial environments.
Abstract
In recent years, deep learning has led to significant advances in bearing fault diagnosis (FD). Most techniques aim to achieve greater accuracy. However, they are sensitive to noise and lack robustness, resulting in insufficient domain adaptation and anti-noise ability. The comparison of studies reveals that giving equal attention to all features does not differentiate their significance. In this work, we propose a novel FD model by integrating multi-scale quaternion convolutional neural network (MQCNN), bidirectional gated recurrent unit (BiGRU), and cross self-attention feature fusion (CSAFF). We have developed innovative designs in two modules, namely MQCNN and CSAFF. Firstly, MQCNN applies quaternion convolution to multi-scale architecture for the first time, aiming to extract the rich hidden features of the original signal from multiple scales. Then, the extracted multi-scale information is input into CSAFF for feature fusion, where CSAFF innovatively incorporates cross self-attention mechanism to enhance discriminative interaction representation within features. Finally, BiGRU captures temporal dependencies while a softmax layer is employed for fault classification, achieving accurate FD. To assess the efficacy of our approach, we experiment on three public datasets (CWRU, MFPT, and Ottawa) and compare it with other excellent methods. The results confirm its state-of-the-art, which the average accuracies can achieve up to 99.99%, 100%, and 99.21% on CWRU, MFPT, and Ottawa datasets. Moreover, we perform practical tests and ablation experiments to validate the efficacy and robustness of the proposed approach. Code is available at https://github.com/mubai011/MQCCAF.
