Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance
Valeriu Dimidov, Faisal Hawlader, Sasan Jafarnejad, Raphaël Frank
TL;DR
The paper tackles data quality challenges in industrial predictive maintenance by introducing LLM-based agents for cleaning automotive maintenance logs. It presents a synthetic data framework, AgenticPdmDataCleaner, to generate paired clean and noisy logs and benchmarks six LLMs in a streaming, zero-shot setting using a Log Cleaning API. The study demonstrates strong performance on generic noise and highlights limitations in domain-specific, temporally-inconsistent cases, offering pathways such as temporal validators and hybrid rule–LLM approaches. This work underscores the potential for autonomous, real-time PdM data curation and lays groundwork for deploying LLM-driven cleaning pipelines in industrial environments, with future validation on real de-identified logs.
Abstract
Economic constraints, limited availability of datasets for reproducibility and shortages of specialized expertise have long been recognized as key challenges to the adoption and advancement of predictive maintenance (PdM) in the automotive sector. Recent progress in large language models (LLMs) presents an opportunity to overcome these barriers and speed up the transition of PdM from research to industrial practice. Under these conditions, we explore the potential of LLM-based agents to support PdM cleaning pipelines. Specifically, we focus on maintenance logs, a critical data source for training well-performing machine learning (ML) models, but one often affected by errors such as typos, missing fields, near-duplicate entries, and incorrect dates. We evaluate LLM agents on cleaning tasks involving six distinct types of noise. Our findings show that LLMs are effective at handling generic cleaning tasks and offer a promising foundation for future industrial applications. While domain-specific errors remain challenging, these results highlight the potential for further improvements through specialized training and enhanced agentic capabilities.
