Table of Contents
Fetching ...

LogLM: From Task-based to Instruction-based Automated Log Analysis

Yilun Liu, Yuhe Ji, Shimin Tao, Minggui He, Weibin Meng, Shenglin Zhang, Yongqian Sun, Yuming Xie, Boxing Chen, Hao Yang

TL;DR

LogLM reframes automated log analysis as instruction following rather than task-specific modeling, unifying parsing, anomaly detection, interpretation, root cause analysis, and solution recommendation into a single, instruction-tuned model trained on cross-task and cross-domain data. It deploys on open-source foundations (LLaMA-2-7B) and demonstrates superior performance over numerous baselines across five capabilities, with strong generalization to unseen and complex instructions. The methodology combines a two-tier capability design with an instruction dataset built from diverse sources, and the training objective maximizes the likelihood of the indicated responses given instructions. Practical deployment in Huawei’s O&M platform and open data releases support real-world applicability and future extensibility for industrial log analysis.

Abstract

Automatic log analysis is essential for the efficient Operation and Maintenance (O&M) of software systems, providing critical insights into system behaviors. However, existing approaches mostly treat log analysis as training a model to perform an isolated task ( e.g., anomaly detection, log parsing, etc.) using task-specific log-label pairs. These task-based approaches are inflexible in generalizing to complex scenarios, depend on task-specific training data, and cost significantly when deploying multiple models. In this paper, we propose an instruction-based training approach that transforms log-label pairs from multiple tasks and domains into a unified format of instruction-response pairs. Our trained model, LogLM, can follow complex user instructions and generalize better across different tasks, thereby increasing flexibility and reducing the dependence on task-specific training data. By integrating major log analysis tasks into a single model, our approach also relieves model deployment burden. Experimentally, LogLM outperforms existing approaches across five log analysis capabilities, and exhibits strong generalization abilities on complex instructions and unseen tasks.

LogLM: From Task-based to Instruction-based Automated Log Analysis

TL;DR

LogLM reframes automated log analysis as instruction following rather than task-specific modeling, unifying parsing, anomaly detection, interpretation, root cause analysis, and solution recommendation into a single, instruction-tuned model trained on cross-task and cross-domain data. It deploys on open-source foundations (LLaMA-2-7B) and demonstrates superior performance over numerous baselines across five capabilities, with strong generalization to unseen and complex instructions. The methodology combines a two-tier capability design with an instruction dataset built from diverse sources, and the training objective maximizes the likelihood of the indicated responses given instructions. Practical deployment in Huawei’s O&M platform and open data releases support real-world applicability and future extensibility for industrial log analysis.

Abstract

Automatic log analysis is essential for the efficient Operation and Maintenance (O&M) of software systems, providing critical insights into system behaviors. However, existing approaches mostly treat log analysis as training a model to perform an isolated task ( e.g., anomaly detection, log parsing, etc.) using task-specific log-label pairs. These task-based approaches are inflexible in generalizing to complex scenarios, depend on task-specific training data, and cost significantly when deploying multiple models. In this paper, we propose an instruction-based training approach that transforms log-label pairs from multiple tasks and domains into a unified format of instruction-response pairs. Our trained model, LogLM, can follow complex user instructions and generalize better across different tasks, thereby increasing flexibility and reducing the dependence on task-specific training data. By integrating major log analysis tasks into a single model, our approach also relieves model deployment burden. Experimentally, LogLM outperforms existing approaches across five log analysis capabilities, and exhibits strong generalization abilities on complex instructions and unseen tasks.

Paper Structure

This paper contains 38 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustrated comparison between (a) existing task-based log analysis approaches and (b) our LogLM, an instruction-based log analysis model.
  • Figure 2: Comparison of average performance between LogLM and existing methods across five log analysis capabilities. The dots indicate the relative percentage of baselines' average performances in comparison to LogLM's. LogLM-7B is fine-tuned from an open-source LLM with 7B parameters.
  • Figure 3: Illustration on the capabilities composition, training dataset construction and training of LogLM.
  • Figure 4: Three cases of LogLM-7B responding to complex user instructions: (a) log-related Q&A, (b) unseen new task, and (c) combination of tasks.
  • Figure 5: Ablation study on the training data of LogLM, evaluated on (a) Log Parsing and (b) Anomaly Detection. See additional results (other three tasks, and an upsampling group to control quantity) in our GitHub Page.