Table of Contents
Fetching ...

LLM-YOLOMS: Large Language Model-based Semantic Interpretation and Fault Diagnosis for Wind Turbine Components

Yaru Li, Yanxue Wang, Meng Li, Xinming Li, Jianbo Feng

TL;DR

The paper presents LLM-YOLOMS, a hybrid framework that fuses a high-precision YOLOMS detector with multimodal and domain-adapted LLMs to deliver semantic fault interpretation and maintenance guidance for wind turbine components. A lightweight Key-Value mapping bridges visual outputs to structured textual inputs, preserving rich fault attributes, while Qwen2-VL and a domain-tuned Llama model enable deep reasoning, fault analysis, and actionable maintenance recommendations. Empirical results show robust detection performance (e.g., mAP50 = 0.896; per-class APs up to 0.955) and high-quality diagnostic reports (APS ≈ 0.89; FCS ≈ 0.94), illustrating improved interpretability and decision support over traditional detection pipelines. The framework demonstrates practical impact by providing end-to-end fault detection, analysis, and O&M guidance, with a scalable pipeline built on UAV imagery, multi-scale data augmentation, and domain-specific fine-tuning.

Abstract

The health condition of wind turbine (WT) components is crucial for ensuring stable and reliable operation. However, existing fault detection methods are largely limited to visual recognition, producing structured outputs that lack semantic interpretability and fail to support maintenance decision-making. To address these limitations, this study proposes an integrated framework that combines YOLOMS with a large language model (LLM) for intelligent fault analysis and diagnosis. Specifically, YOLOMS employs multi-scale detection and sliding-window cropping to enhance fault feature extraction, while a lightweight key-value (KV) mapping module bridges the gap between visual outputs and textual inputs. This module converts YOLOMS detection results into structured textual representations enriched with both qualitative and quantitative attributes. A domain-tuned LLM then performs semantic reasoning to generate interpretable fault analyses and maintenance recommendations. Experiments on real-world datasets demonstrate that the proposed framework achieves a fault detection accuracy of 90.6\% and generates maintenance reports with an average accuracy of 89\%, thereby improving the interpretability of diagnostic results and providing practical decision support for the operation and maintenance of wind turbines.

LLM-YOLOMS: Large Language Model-based Semantic Interpretation and Fault Diagnosis for Wind Turbine Components

TL;DR

The paper presents LLM-YOLOMS, a hybrid framework that fuses a high-precision YOLOMS detector with multimodal and domain-adapted LLMs to deliver semantic fault interpretation and maintenance guidance for wind turbine components. A lightweight Key-Value mapping bridges visual outputs to structured textual inputs, preserving rich fault attributes, while Qwen2-VL and a domain-tuned Llama model enable deep reasoning, fault analysis, and actionable maintenance recommendations. Empirical results show robust detection performance (e.g., mAP50 = 0.896; per-class APs up to 0.955) and high-quality diagnostic reports (APS ≈ 0.89; FCS ≈ 0.94), illustrating improved interpretability and decision support over traditional detection pipelines. The framework demonstrates practical impact by providing end-to-end fault detection, analysis, and O&M guidance, with a scalable pipeline built on UAV imagery, multi-scale data augmentation, and domain-specific fine-tuning.

Abstract

The health condition of wind turbine (WT) components is crucial for ensuring stable and reliable operation. However, existing fault detection methods are largely limited to visual recognition, producing structured outputs that lack semantic interpretability and fail to support maintenance decision-making. To address these limitations, this study proposes an integrated framework that combines YOLOMS with a large language model (LLM) for intelligent fault analysis and diagnosis. Specifically, YOLOMS employs multi-scale detection and sliding-window cropping to enhance fault feature extraction, while a lightweight key-value (KV) mapping module bridges the gap between visual outputs and textual inputs. This module converts YOLOMS detection results into structured textual representations enriched with both qualitative and quantitative attributes. A domain-tuned LLM then performs semantic reasoning to generate interpretable fault analyses and maintenance recommendations. Experiments on real-world datasets demonstrate that the proposed framework achieves a fault detection accuracy of 90.6\% and generates maintenance reports with an average accuracy of 89\%, thereby improving the interpretability of diagnostic results and providing practical decision support for the operation and maintenance of wind turbines.

Paper Structure

This paper contains 16 sections, 10 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Diagram of the framework using YOLOMS for LLM-assisted wind turbine component fault detection and diagnosis.
  • Figure 2: LLM-assisted fault detection and diagnostic support and functional framework for WT components using the YOLOMS model.
  • Figure 3: YOLOv12 network structure diagram.
  • Figure 4: Structural diagram of the Key-Valued lightweight image-text mapping method. In the figure, “PS” and “SD” represent the abbreviations of the fault names, and “f” represents the frequency of occurrence of various faults.
  • Figure 5: Two sets of graph mapping results, whose label vectors have been converted to text by the KV module.
  • ...and 8 more figures