Table of Contents
Fetching ...

LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models

Jiaqi Sun, Wei Li, Heng Zhang, Chutong Ding, Shiyou Qian, Jian Cao, Guangtao Xue

TL;DR

The paper tackles the problem of reactive, log-centric log parsing by proposing LLM-SrcLog, a proactive framework that derives log templates from source code using a Static Code Analyzer and an LLM-driven White-box Extractor, complemented by a Drain3-based Black-box extractor for logs with unavailable code. It demonstrates that pre-deployment, code-grounded templates improve template quality while maintaining online parsing efficiency, bridging the gap between semantic understanding and practical deployment. Across Hadoop, Zookeeper, and Sunfire-Compute, LLM-SrcLog yields higher F1 scores than strong LLM baselines and achieves online parsing latency similar to traditional data-driven methods, orders of magnitude faster than per-log LLM inference. Real-world deployment in Alibaba production environments further validates its utility for troubleshooting and root-cause analysis, illustrating tangible observability improvements in large-scale systems.

Abstract

Log parsing transforms raw logs into structured templates containing constants and variables. It underpins anomaly detection, failure diagnosis, and other AIOps tasks. Current parsers are mostly reactive and log-centric. They only infer templates from logs, mostly overlooking the source code. This restricts their capacity to grasp dynamic log structures or adjust to evolving systems. Moreover, per-log LLM inference is too costly for practical deployment. In this paper, we propose LLM-SrcLog, a proactive and unified framework for log template parsing. It extracts templates directly from source code prior to deployment and supplements them with data-driven parsing for logs without available code. LLM-SrcLog integrates a cross-function static code analyzer to reconstruct meaningful logging contexts, an LLM-based white-box template extractor with post-processing to distinguish constants from variables, and a black-box template extractor that incorporates data-driven clustering for remaining unmatched logs. Experiments on two public benchmarks (Hadoop and Zookeeper) and a large-scale industrial system (Sunfire-Compute) show that, compared to two LLM-based baselines, LLM-SrcLog improves average F1-score by 2-17% and 8-35%. Meanwhile, its online parsing latency is comparable to data-driven methods and about 1,000 times faster than per-log LLM parsing. LLM-SrcLog achieves a near-ideal balance between speed and accuracy. Finally, we further validate the effectiveness of LLM-SrcLog through practical case studies in a real-world production environment.

LLM-SrcLog: Towards Proactive and Unified Log Template Extraction via Large Language Models

TL;DR

The paper tackles the problem of reactive, log-centric log parsing by proposing LLM-SrcLog, a proactive framework that derives log templates from source code using a Static Code Analyzer and an LLM-driven White-box Extractor, complemented by a Drain3-based Black-box extractor for logs with unavailable code. It demonstrates that pre-deployment, code-grounded templates improve template quality while maintaining online parsing efficiency, bridging the gap between semantic understanding and practical deployment. Across Hadoop, Zookeeper, and Sunfire-Compute, LLM-SrcLog yields higher F1 scores than strong LLM baselines and achieves online parsing latency similar to traditional data-driven methods, orders of magnitude faster than per-log LLM inference. Real-world deployment in Alibaba production environments further validates its utility for troubleshooting and root-cause analysis, illustrating tangible observability improvements in large-scale systems.

Abstract

Log parsing transforms raw logs into structured templates containing constants and variables. It underpins anomaly detection, failure diagnosis, and other AIOps tasks. Current parsers are mostly reactive and log-centric. They only infer templates from logs, mostly overlooking the source code. This restricts their capacity to grasp dynamic log structures or adjust to evolving systems. Moreover, per-log LLM inference is too costly for practical deployment. In this paper, we propose LLM-SrcLog, a proactive and unified framework for log template parsing. It extracts templates directly from source code prior to deployment and supplements them with data-driven parsing for logs without available code. LLM-SrcLog integrates a cross-function static code analyzer to reconstruct meaningful logging contexts, an LLM-based white-box template extractor with post-processing to distinguish constants from variables, and a black-box template extractor that incorporates data-driven clustering for remaining unmatched logs. Experiments on two public benchmarks (Hadoop and Zookeeper) and a large-scale industrial system (Sunfire-Compute) show that, compared to two LLM-based baselines, LLM-SrcLog improves average F1-score by 2-17% and 8-35%. Meanwhile, its online parsing latency is comparable to data-driven methods and about 1,000 times faster than per-log LLM parsing. LLM-SrcLog achieves a near-ideal balance between speed and accuracy. Finally, we further validate the effectiveness of LLM-SrcLog through practical case studies in a real-world production environment.

Paper Structure

This paper contains 31 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 2: Temporal trends of error logs and network retransmissions for App1 over a 24-hour period. In both subfigures, the horizontal axis denotes time within the observation window. In Subfigure \ref{['fig:app1_logs']}, the vertical axis denotes the average number of error-level log messages per minute that are successfully matched to templates. In Subfigure \ref{['fig:app1_network']}, the vertical axis denotes the average network retransmission rate per minute.
  • Figure 3: Temporal trends of error logs and network retransmissions for App2 over a 24-hour period. In both subfigures, the horizontal axis denotes time within the observation window. In Subfigure \ref{['fig:app2_logs']}, the vertical axis denotes the average number of error-level log messages per minute matched to templates. In Subfigure \ref{['fig:app2_network']}, the vertical axis denotes the average network retransmission rate per minute.