Table of Contents
Fetching ...

LogLSHD: Fast Log Parsing with Locality-Sensitive Hashing and Dynamic Time Warping

Shu-Wei Huang, Xingfang Wu, Heng Li

TL;DR

LogLSHD tackles the scalability challenge in log parsing by combining Locality-Sensitive Hashing for fast log grouping with Dynamic Time Warping for accurate template extraction. The approach achieves state-of-the-art or near-state-of-the-art parsing accuracy while significantly reducing parsing time on the Loghub-2.0 benchmark, outperforming Drain by up to 73% in speed and increasing accuracy by about 15%. Key contributions include a complete four-stage pipeline (preprocessing, LSH-based grouping, cluster merging, and DTW-based template extraction), an extensive evaluation across 14 large-scale datasets, and a detailed analysis of grouping strategies and DTW impact. The results demonstrate practical benefits for real-time log analysis and downstream tasks, with replication data and parameter sensitivity discussions provided for reproducibility.

Abstract

Large-scale software systems generate vast volumes of system logs that are essential for monitoring, diagnosing, and performance optimization. However, the unstructured nature and ever-growing scale of these logs present significant challenges for manual analysis and automated downstream tasks such as anomaly detection. Log parsing addresses these challenges by converting raw logs into structured formats, enabling efficient log analysis. Despite its importance, existing log parsing methods suffer from limitations in efficiency and scalability, due to the large size of log data and their heterogeneous formats. To overcome these challenges, this study proposes a log parsing approach, LogLSHD, which leverages Locality-Sensitive Hashing (LSH) to group similar logs and integrates Dynamic Time Warping (DTW) to enhance the accuracy of template extraction. LogLSHD demonstrates exceptional efficiency in parsing time, significantly outperforming state-of-the-art methods. For example, compared to Drain, LogLSHD reduces the average parsing time by 73% while increasing the average parsing accuracy by 15% on the LogHub 2.0 benchmark.

LogLSHD: Fast Log Parsing with Locality-Sensitive Hashing and Dynamic Time Warping

TL;DR

LogLSHD tackles the scalability challenge in log parsing by combining Locality-Sensitive Hashing for fast log grouping with Dynamic Time Warping for accurate template extraction. The approach achieves state-of-the-art or near-state-of-the-art parsing accuracy while significantly reducing parsing time on the Loghub-2.0 benchmark, outperforming Drain by up to 73% in speed and increasing accuracy by about 15%. Key contributions include a complete four-stage pipeline (preprocessing, LSH-based grouping, cluster merging, and DTW-based template extraction), an extensive evaluation across 14 large-scale datasets, and a detailed analysis of grouping strategies and DTW impact. The results demonstrate practical benefits for real-time log analysis and downstream tasks, with replication data and parameter sensitivity discussions provided for reproducibility.

Abstract

Large-scale software systems generate vast volumes of system logs that are essential for monitoring, diagnosing, and performance optimization. However, the unstructured nature and ever-growing scale of these logs present significant challenges for manual analysis and automated downstream tasks such as anomaly detection. Log parsing addresses these challenges by converting raw logs into structured formats, enabling efficient log analysis. Despite its importance, existing log parsing methods suffer from limitations in efficiency and scalability, due to the large size of log data and their heterogeneous formats. To overcome these challenges, this study proposes a log parsing approach, LogLSHD, which leverages Locality-Sensitive Hashing (LSH) to group similar logs and integrates Dynamic Time Warping (DTW) to enhance the accuracy of template extraction. LogLSHD demonstrates exceptional efficiency in parsing time, significantly outperforming state-of-the-art methods. For example, compared to Drain, LogLSHD reduces the average parsing time by 73% while increasing the average parsing accuracy by 15% on the LogHub 2.0 benchmark.

Paper Structure

This paper contains 41 sections, 9 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The structure of LogLSHD.
  • Figure 2: Illustration of template extraction with DTW.
  • Figure 3: Comparison of log parsing methods with accuracy metrics.
  • Figure 4: Comparison of log parsing methods with parsing time.
  • Figure 5: Impact of LSH Jaccard Threshold on PA and GA.
  • ...and 1 more figures