Table of Contents
Fetching ...

LogLead -- Fast and Integrated Log Loader, Enhancer, and Anomaly Detector

Mika Mäntylä, Yuqing Wang, Jesse Nyyssölä

TL;DR

LogLead provides an integrated, open-source framework for end-to-end log analysis benchmarking by uniting loading, enhancement, and anomaly detection in a Polars-based pipeline. It leverages 8 public datasets, 7 enhancement techniques, and 11 detectors to enable over 600 configurations, with substantial speedups in log loading (over 10x) and efficient normalization offloading that improves parsing speed. Empirical results indicate that advanced log representations offer limited gains for anomaly detection on certain datasets, suggesting lightweight representations can be effective in practice. The tool emphasizes extensibility, performance, and reproducibility to accelerate research in log anomaly detection and benchmarking across diverse data sources. The paper also discusses current limitations and future work, including expanding loaders, enhancers, detectors, and integration with more datasets and detectors like DeepLog.

Abstract

This paper introduces LogLead, a tool designed for efficient log analysis benchmarking. LogLead combines three essential steps in log processing: loading, enhancing, and anomaly detection. The tool leverages Polars, a high-speed DataFrame library. We currently have Loaders for eight systems that are publicly available (HDFS, Hadoop, BGL, Thunderbird, Spirit, Liberty, TrainTicket, and GC Webshop). We have multiple enhancers with three parsers (Drain, Spell, LenMa), Bert embedding creation and other log representation techniques like bag-of-words. LogLead integrates to five supervised and four unsupervised machine learning algorithms for anomaly detection from SKLearn. By integrating diverse datasets, log representation methods and anomaly detectors, LogLead facilitates comprehensive benchmarking in log analysis research. We show that log loading from raw file to dataframe is over 10x faster with LogLead compared to past solutions. We demonstrate roughly 2x improvement in Drain parsing speed by off-loading log message normalization to LogLead. Our brief benchmarking on HDFS indicates that log representations extending beyond the bag-of-words approach offer limited additional benefits. Tool URL: https://github.com/EvoTestOps/LogLead

LogLead -- Fast and Integrated Log Loader, Enhancer, and Anomaly Detector

TL;DR

LogLead provides an integrated, open-source framework for end-to-end log analysis benchmarking by uniting loading, enhancement, and anomaly detection in a Polars-based pipeline. It leverages 8 public datasets, 7 enhancement techniques, and 11 detectors to enable over 600 configurations, with substantial speedups in log loading (over 10x) and efficient normalization offloading that improves parsing speed. Empirical results indicate that advanced log representations offer limited gains for anomaly detection on certain datasets, suggesting lightweight representations can be effective in practice. The tool emphasizes extensibility, performance, and reproducibility to accelerate research in log anomaly detection and benchmarking across diverse data sources. The paper also discusses current limitations and future work, including expanding loaders, enhancers, detectors, and integration with more datasets and detectors like DeepLog.

Abstract

This paper introduces LogLead, a tool designed for efficient log analysis benchmarking. LogLead combines three essential steps in log processing: loading, enhancing, and anomaly detection. The tool leverages Polars, a high-speed DataFrame library. We currently have Loaders for eight systems that are publicly available (HDFS, Hadoop, BGL, Thunderbird, Spirit, Liberty, TrainTicket, and GC Webshop). We have multiple enhancers with three parsers (Drain, Spell, LenMa), Bert embedding creation and other log representation techniques like bag-of-words. LogLead integrates to five supervised and four unsupervised machine learning algorithms for anomaly detection from SKLearn. By integrating diverse datasets, log representation methods and anomaly detectors, LogLead facilitates comprehensive benchmarking in log analysis research. We show that log loading from raw file to dataframe is over 10x faster with LogLead compared to past solutions. We demonstrate roughly 2x improvement in Drain parsing speed by off-loading log message normalization to LogLead. Our brief benchmarking on HDFS indicates that log representations extending beyond the bag-of-words approach offer limited additional benefits. Tool URL: https://github.com/EvoTestOps/LogLead
Paper Structure (21 sections, 1 figure, 4 tables)