System Log Parsing with Large Language Models: A Review
Viktor Beck, Max Landauer, Markus Wurzenberger, Florian Skopik, Andreas Rauber
TL;DR
This paper systematically reviews the landscape of large language model–based log parsing, synthesizing 29 methods and benchmarking seven open-source approaches on public datasets. It presents a unified process pipeline, clarifies terminology across General Properties, Processing Steps, and Reproducibility, and evaluates both effectiveness and efficiency with a transparent benchmark. Key findings show that while LLM-based parsers like LogBatcher and LILAC can outperform traditional baselines in some settings, reproducibility and comparability remain major challenges due to heterogeneous datasets, metrics, and reporting. The work highlights effective techniques—ICL, RAG, caching, and template revision—and argues for standardized benchmarks and reporting practices to advance practical, reproducible log parsing with LLMs.
Abstract
Log data provides crucial insights for tasks like monitoring, root cause analysis, and anomaly detection. Due to the vast volume of logs, automated log parsing is essential to transform semi-structured log messages into structured representations. Recent advances in large language models (LLMs) have introduced the new research field of LLM-based log parsing. Despite promising results, there is no structured overview of the approaches in this relatively new research field with the earliest advances published in late 2023. This work systematically reviews 29 LLM-based log parsing methods. We benchmark seven of them on public datasets and critically assess their comparability and the reproducibility of their reported results. Our findings summarize the advances of this new research field, with insights on how to report results, which data sets, metrics and which terminology to use, and which inconsistencies to avoid, with code and results made publicly available for transparency.
