Table of Contents
Fetching ...

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

Yilun Liu, Shimin Tao, Weibin Meng, Jingyu Wang, Wenbing Ma, Yanqing Zhao, Yuhang Chen, Hao Yang, Yanfei Jiang, Xun Chen

TL;DR

This paper tackles the dual challenges of online log analysis and interpretability by introducing LogPrompt, a prompt-strategy framework that leverages large language models without requiring in-domain training. By standardizing input/output formats and employing self-prompting, chain-of-thought prompts, and in-context demonstrations, LogPrompt achieves strong zero-shot performance on log parsing and anomaly detection across nine datasets, and provides useful, readable explanations validated by practitioners. The approach also demonstrates robustness with open-source and smaller-scale LLMs, suggesting practical deployment viability. Collectively, LogPrompt offers a scalable, interpretable solution for maintenance and operations tasks in diverse software systems, with public code to foster further adoption and research.

Abstract

Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as log parsing and log anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, the limited interpretability of analysis results hinders analysts' comprehension of program status and their ability to take appropriate actions. Moreover, these methods require substantial in-domain training data, and their performance declines sharply (by up to 62.5%) in online scenarios involving unseen logs from new domains, a common occurrence due to rapid software updates. In this paper, we propose LogPrompt, a novel interpretable log analysis approach for online scenarios. LogPrompt employs large language models (LLMs) to perform online log analysis tasks via a suite of advanced prompt strategies tailored for log tasks, which enhances LLMs' performance by up to 380.7% compared with simple prompts. Experiments on nine publicly available evaluation datasets across two tasks demonstrate that LogPrompt, despite requiring no in-domain training, outperforms existing approaches trained on thousands of logs by up to 55.9%. We also conduct a human evaluation of LogPrompt's interpretability, with six practitioners possessing over 10 years of experience, who highly rated the generated content in terms of usefulness and readability (averagely 4.42/5). LogPrompt also exhibits remarkable compatibility with open-source and smaller-scale LLMs, making it flexible for practical deployment. Code of LogPrompt is available at https://github.com/lunyiliu/LogPrompt.

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

TL;DR

This paper tackles the dual challenges of online log analysis and interpretability by introducing LogPrompt, a prompt-strategy framework that leverages large language models without requiring in-domain training. By standardizing input/output formats and employing self-prompting, chain-of-thought prompts, and in-context demonstrations, LogPrompt achieves strong zero-shot performance on log parsing and anomaly detection across nine datasets, and provides useful, readable explanations validated by practitioners. The approach also demonstrates robustness with open-source and smaller-scale LLMs, suggesting practical deployment viability. Collectively, LogPrompt offers a scalable, interpretable solution for maintenance and operations tasks in diverse software systems, with public code to foster further adoption and research.

Abstract

Automated log analysis is crucial in modern software-intensive systems for facilitating program comprehension throughout software maintenance and engineering life cycles. Existing methods perform tasks such as log parsing and log anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, the limited interpretability of analysis results hinders analysts' comprehension of program status and their ability to take appropriate actions. Moreover, these methods require substantial in-domain training data, and their performance declines sharply (by up to 62.5%) in online scenarios involving unseen logs from new domains, a common occurrence due to rapid software updates. In this paper, we propose LogPrompt, a novel interpretable log analysis approach for online scenarios. LogPrompt employs large language models (LLMs) to perform online log analysis tasks via a suite of advanced prompt strategies tailored for log tasks, which enhances LLMs' performance by up to 380.7% compared with simple prompts. Experiments on nine publicly available evaluation datasets across two tasks demonstrate that LogPrompt, despite requiring no in-domain training, outperforms existing approaches trained on thousands of logs by up to 55.9%. We also conduct a human evaluation of LogPrompt's interpretability, with six practitioners possessing over 10 years of experience, who highly rated the generated content in terms of usefulness and readability (averagely 4.42/5). LogPrompt also exhibits remarkable compatibility with open-source and smaller-scale LLMs, making it flexible for practical deployment. Code of LogPrompt is available at https://github.com/lunyiliu/LogPrompt.
Paper Structure (46 sections, 5 figures, 8 tables)

This paper contains 46 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Illustration of utilizing a simple prompt for log anomaly detection with a large language model. [X] represents the input slot, while [Z] denotes the answer slot.
  • Figure 2: The average performance of existing log analysis approaches deteriorates when in-domain logs for training are limited in availability.
  • Figure 3: Prompts advised by ChatGPT for the task of log parsing (Dialogue date: 2023/02/25).
  • Figure 4: Performance comparison of the five raised prompt candidates in Fig. \ref{['fig4']} for Log Parsing.
  • Figure 5: Variation in F1-score and Precision of the in-context prompt with an increase in the number of provided logs.