Table of Contents
Fetching ...

BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation

Haotian Peng, Jiawei Liu, Jinsong Du, Jie Gao, Wei Wang

TL;DR

BearLLM presents a novel multimodal bearing health management framework that unifies anomaly detection, fault diagnosis, maintenance recommendations, and risk analysis via vibration signals and prompts to an LLM. It introduces a prior knowledge‑enhanced unified vibration signal representation, combining adaptive sampling, DCN‑based frequency alignment, and fault‑free references to produce a robust input for an FCN and a subsequent alignment to text embeddings. A large MBHM dataset with paired vibration signals and textual descriptions supports multi‑task evaluation, and experimental results show state‑of‑the‑art performance across nine public benchmarks, including strong zero‑shot generalization and favorable user‑study responses after fine‑tuning. The approach delivers plug‑and‑play components (DCN, FCN, alignment layer) and provides a foundation for scalable industrial multimodal models, with potential extensions to additional modalities and remaining useful life prediction.

Abstract

We propose a bearing health management framework leveraging large language models (BearLLM), a novel multimodal model that unifies multiple bearing-related tasks by processing user prompts and vibration signals. Specifically, we introduce a prior knowledge-enhanced unified vibration signal representation to handle various working conditions across multiple datasets. This involves adaptively sampling the vibration signals based on the sampling rate of the sensor, incorporating the frequency domain to unify input dimensions, and using a fault-free reference signal as an auxiliary input. To extract features from vibration signals, we first train a fault classification network, then convert and align the extracted features into word embedding, and finally concatenate these with text embedding as input to an LLM. To evaluate the performance of the proposed method, we constructed the first large-scale multimodal bearing health management (MBHM) dataset, including paired vibration signals and textual descriptions. With our unified vibration signal representation, BearLLM using one set of pre-trained weights achieves state-of-the-art performance on nine publicly available fault diagnosis benchmarks, outperforming specific methods designed for individual datasets. We provide a dataset, our model, and code to inspire future research on building more capable industrial multimodal models https://github.com/SIA-IDE/BearLLM.

BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation

TL;DR

BearLLM presents a novel multimodal bearing health management framework that unifies anomaly detection, fault diagnosis, maintenance recommendations, and risk analysis via vibration signals and prompts to an LLM. It introduces a prior knowledge‑enhanced unified vibration signal representation, combining adaptive sampling, DCN‑based frequency alignment, and fault‑free references to produce a robust input for an FCN and a subsequent alignment to text embeddings. A large MBHM dataset with paired vibration signals and textual descriptions supports multi‑task evaluation, and experimental results show state‑of‑the‑art performance across nine public benchmarks, including strong zero‑shot generalization and favorable user‑study responses after fine‑tuning. The approach delivers plug‑and‑play components (DCN, FCN, alignment layer) and provides a foundation for scalable industrial multimodal models, with potential extensions to additional modalities and remaining useful life prediction.

Abstract

We propose a bearing health management framework leveraging large language models (BearLLM), a novel multimodal model that unifies multiple bearing-related tasks by processing user prompts and vibration signals. Specifically, we introduce a prior knowledge-enhanced unified vibration signal representation to handle various working conditions across multiple datasets. This involves adaptively sampling the vibration signals based on the sampling rate of the sensor, incorporating the frequency domain to unify input dimensions, and using a fault-free reference signal as an auxiliary input. To extract features from vibration signals, we first train a fault classification network, then convert and align the extracted features into word embedding, and finally concatenate these with text embedding as input to an LLM. To evaluate the performance of the proposed method, we constructed the first large-scale multimodal bearing health management (MBHM) dataset, including paired vibration signals and textual descriptions. With our unified vibration signal representation, BearLLM using one set of pre-trained weights achieves state-of-the-art performance on nine publicly available fault diagnosis benchmarks, outperforming specific methods designed for individual datasets. We provide a dataset, our model, and code to inspire future research on building more capable industrial multimodal models https://github.com/SIA-IDE/BearLLM.
Paper Structure (17 sections, 7 equations, 18 figures, 6 tables, 3 algorithms)

This paper contains 17 sections, 7 equations, 18 figures, 6 tables, 3 algorithms.

Figures (18)

  • Figure 1: Comparison of existing bearing health management frameworks chaleshtoriNovelBearingFault2024niDatadrivenBearingHealth2024 with our proposed approach. Our BearLLM replaces the complex operations of designing methods tailored to different conditions and tasks.
  • Figure 2: Architecture of our proposed BearLLM. Given a query vibration signal segment $X_v$ and user instruction $X_t$ as input, the model retrieves a fault-free vibration signal segment $\tilde{X}_v$ with similar working conditions from the database as a reference. Two vibration signals are converted into a unified representation through DCN. A feature encoder identifies fault-related residuals between the two signals. The alignment layer converts these features into the word embedding $H_V$. Finally, an LLM is utilized with the user text embedding $H_T$ to generate multi-task natural language responses, where $n_t$ represents the length of the encoded text embedding.
  • Figure 3: Sample case of our MBHM dataset, includes vibration signal $X_v$, fault label $L_v$, working condition $C$, the specific task prompt text $X_t$, and the response text $L_t$.
  • Figure 4: Structure of our proposed FCN. In the feature encoder, three wide convolutions are first used to extract main features, followed by three MSCAB blocks to transform and fuse multi-scale features for fault classification. The pre-trained FCN is used to initialize the feature extractor and alignment layer of BearLLM.
  • Figure 5: Accuracy and learning rate trends during training for different models. (a) Replacing the network of BearingFM with FCN resulted in increased accuracy and accelerated convergence. (b) Incorporating DCN into QCNN significantly mitigated overfit. (c) Our proposed method exhibits the fastest convergence and highest accuracy.
  • ...and 13 more figures