Table of Contents
Fetching ...

Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance

Valeriu Dimidov, Faisal Hawlader, Sasan Jafarnejad, Raphaël Frank

TL;DR

The paper tackles data quality challenges in industrial predictive maintenance by introducing LLM-based agents for cleaning automotive maintenance logs. It presents a synthetic data framework, AgenticPdmDataCleaner, to generate paired clean and noisy logs and benchmarks six LLMs in a streaming, zero-shot setting using a Log Cleaning API. The study demonstrates strong performance on generic noise and highlights limitations in domain-specific, temporally-inconsistent cases, offering pathways such as temporal validators and hybrid rule–LLM approaches. This work underscores the potential for autonomous, real-time PdM data curation and lays groundwork for deploying LLM-driven cleaning pipelines in industrial environments, with future validation on real de-identified logs.

Abstract

Economic constraints, limited availability of datasets for reproducibility and shortages of specialized expertise have long been recognized as key challenges to the adoption and advancement of predictive maintenance (PdM) in the automotive sector. Recent progress in large language models (LLMs) presents an opportunity to overcome these barriers and speed up the transition of PdM from research to industrial practice. Under these conditions, we explore the potential of LLM-based agents to support PdM cleaning pipelines. Specifically, we focus on maintenance logs, a critical data source for training well-performing machine learning (ML) models, but one often affected by errors such as typos, missing fields, near-duplicate entries, and incorrect dates. We evaluate LLM agents on cleaning tasks involving six distinct types of noise. Our findings show that LLMs are effective at handling generic cleaning tasks and offer a promising foundation for future industrial applications. While domain-specific errors remain challenging, these results highlight the potential for further improvements through specialized training and enhanced agentic capabilities.

Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance

TL;DR

The paper tackles data quality challenges in industrial predictive maintenance by introducing LLM-based agents for cleaning automotive maintenance logs. It presents a synthetic data framework, AgenticPdmDataCleaner, to generate paired clean and noisy logs and benchmarks six LLMs in a streaming, zero-shot setting using a Log Cleaning API. The study demonstrates strong performance on generic noise and highlights limitations in domain-specific, temporally-inconsistent cases, offering pathways such as temporal validators and hybrid rule–LLM approaches. This work underscores the potential for autonomous, real-time PdM data curation and lays groundwork for deploying LLM-driven cleaning pipelines in industrial environments, with future validation on real de-identified logs.

Abstract

Economic constraints, limited availability of datasets for reproducibility and shortages of specialized expertise have long been recognized as key challenges to the adoption and advancement of predictive maintenance (PdM) in the automotive sector. Recent progress in large language models (LLMs) presents an opportunity to overcome these barriers and speed up the transition of PdM from research to industrial practice. Under these conditions, we explore the potential of LLM-based agents to support PdM cleaning pipelines. Specifically, we focus on maintenance logs, a critical data source for training well-performing machine learning (ML) models, but one often affected by errors such as typos, missing fields, near-duplicate entries, and incorrect dates. We evaluate LLM agents on cleaning tasks involving six distinct types of noise. Our findings show that LLMs are effective at handling generic cleaning tasks and offer a promising foundation for future industrial applications. While domain-specific errors remain challenging, these results highlight the potential for further improvements through specialized training and enhanced agentic capabilities.

Paper Structure

This paper contains 29 sections, 2 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Fleet Monitoring, Repair, and Maintenance Logging Process.
  • Figure 2: Clean data excerpts: (a) Fleet Registry, (b) Sensor Table, (c) Service Operations Catalog, (d) Maintenance Log.
  • Figure 3: Noisy Maintenance Log.
  • Figure 4: Agent environment with data sources and Log Cleaning API. (1) A noisy record $m'_k$ is provided to the LLM-based agent; (2) the agent optionally queries enterprise data sources through database tools; (3) the agent issues a structured action to the Log Cleaning API: accept, reject, or update.
  • Figure 5: System Prompt.
  • ...and 2 more figures