Table of Contents
Fetching ...

On the Effectiveness of Log Representation for Log-based Anomaly Detection

Xingfang Wu, Heng Li, Foutse Khomh

TL;DR

This work addresses how different log representation techniques affect ML-based log analysis, focusing on log-based anomaly detection. It conducts a comprehensive empirical study evaluating six representations with seven models across four public datasets, also examining log parsing and feature aggregation. Key findings show no single best representation across all settings; classical representations like Message Count Vector perform well with traditional models, while contextual embeddings such as BERT often benefit deep models, with log parsing generally improving representations and aggregation choices impacting performance in dataset-specific ways. These insights offer practical guidance for designing robust automated log-analysis pipelines and are supported by a replication package for reproducibility.

Abstract

Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners' opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we select six log representation techniques and evaluate them with seven ML models and four public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of log-based anomaly detection. We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques. From the experiments, we provide some heuristic guidelines for future researchers and developers to follow when designing an automated log analysis workflow. We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.

On the Effectiveness of Log Representation for Log-based Anomaly Detection

TL;DR

This work addresses how different log representation techniques affect ML-based log analysis, focusing on log-based anomaly detection. It conducts a comprehensive empirical study evaluating six representations with seven models across four public datasets, also examining log parsing and feature aggregation. Key findings show no single best representation across all settings; classical representations like Message Count Vector perform well with traditional models, while contextual embeddings such as BERT often benefit deep models, with log parsing generally improving representations and aggregation choices impacting performance in dataset-specific ways. These insights offer practical guidance for designing robust automated log-analysis pipelines and are supported by a replication package for reproducibility.

Abstract

Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners' opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we select six log representation techniques and evaluate them with seven ML models and four public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of log-based anomaly detection. We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques. From the experiments, we provide some heuristic guidelines for future researchers and developers to follow when designing an automated log analysis workflow. We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.
Paper Structure (49 sections, 7 figures, 8 tables)

This paper contains 49 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Different levels of abstraction of log representation.
  • Figure 2: General workflow of our experiments. The variations for each research question are highlighted with dotted boxes.
  • Figure 3: Results of logistic regression model with different grouping settings on Thunderbird dataset.
  • Figure 4: Comparison of performances of the studied anomaly detection models using the Word2Vec and FastText representations that are generated from parsed and unparsed logs.
  • Figure 5: Visualization of representations generated with FastText using t-SNE. 200 positive (red) and negative (green) samples are randomly sampled from the HDFS dataset.
  • ...and 2 more figures