Table of Contents
Fetching ...

FastLogAD: Log Anomaly Detection with Mask-Guided Pseudo Anomaly Generation and Discrimination

Yifei Lin, Hanqiu Deng, Xingyu Li

TL;DR

FastLogAD tackles fast, unsupervised log anomaly detection by combining a generator to produce pseudo anomalies with a discriminative one-class detector. It employs MGAG to create realistic perturbations of normal log sequences and trains a discriminator with RTD and HST to tightly cluster normal embeddings while pushing anomalies outward, using the CLS embedding norm as the anomaly score. The threshold ε is set from normal validation data (ε = quantile_{0.99}(||φ_{θ_D}(s)_{cls}||_2)), enabling detection without access to real anomalies during testing. On HDFS, BGL, and Thunderbird datasets, FastLogAD achieves competitive or superior F1 scores and at least 10x faster anomaly detection than prior methods, highlighting its practical potential for real-time, domain-specific log analysis.

Abstract

Nowadays large computers extensively output logs to record the runtime status and it has become crucial to identify any suspicious or malicious activities from the information provided by the realtime logs. Thus, fast log anomaly detection is a necessary task to be implemented for automating the infeasible manual detection. Most of the existing unsupervised methods are trained only on normal log data, but they usually require either additional abnormal data for hyperparameter selection or auxiliary datasets for discriminative model optimization. In this paper, aiming for a highly effective discriminative model that enables rapid anomaly detection,we propose FastLogAD, a generator-discriminator framework trained to exhibit the capability of generating pseudo-abnormal logs through the Mask-Guided Anomaly Generation (MGAG) model and efficiently identifying the anomalous logs via the Discriminative Abnormality Separation (DAS) model. Particularly, pseudo-abnormal logs are generated by replacing randomly masked tokens in a normal sequence with unlikely candidates. During the discriminative stage, FastLogAD learns a distinct separation between normal and pseudoabnormal samples based on their embedding norms, allowing the selection of a threshold without exposure to any test data and achieving competitive performance. Extensive experiments on several common benchmarks show that our proposed FastLogAD outperforms existing anomaly detection approaches. Furthermore, compared to previous methods, FastLogAD achieves at least x10 speed increase in anomaly detection over prior work. Our implementation is available at https://github.com/YifeiLin0226/FastLogAD.

FastLogAD: Log Anomaly Detection with Mask-Guided Pseudo Anomaly Generation and Discrimination

TL;DR

FastLogAD tackles fast, unsupervised log anomaly detection by combining a generator to produce pseudo anomalies with a discriminative one-class detector. It employs MGAG to create realistic perturbations of normal log sequences and trains a discriminator with RTD and HST to tightly cluster normal embeddings while pushing anomalies outward, using the CLS embedding norm as the anomaly score. The threshold ε is set from normal validation data (ε = quantile_{0.99}(||φ_{θ_D}(s)_{cls}||_2)), enabling detection without access to real anomalies during testing. On HDFS, BGL, and Thunderbird datasets, FastLogAD achieves competitive or superior F1 scores and at least 10x faster anomaly detection than prior methods, highlighting its practical potential for real-time, domain-specific log analysis.

Abstract

Nowadays large computers extensively output logs to record the runtime status and it has become crucial to identify any suspicious or malicious activities from the information provided by the realtime logs. Thus, fast log anomaly detection is a necessary task to be implemented for automating the infeasible manual detection. Most of the existing unsupervised methods are trained only on normal log data, but they usually require either additional abnormal data for hyperparameter selection or auxiliary datasets for discriminative model optimization. In this paper, aiming for a highly effective discriminative model that enables rapid anomaly detection,we propose FastLogAD, a generator-discriminator framework trained to exhibit the capability of generating pseudo-abnormal logs through the Mask-Guided Anomaly Generation (MGAG) model and efficiently identifying the anomalous logs via the Discriminative Abnormality Separation (DAS) model. Particularly, pseudo-abnormal logs are generated by replacing randomly masked tokens in a normal sequence with unlikely candidates. During the discriminative stage, FastLogAD learns a distinct separation between normal and pseudoabnormal samples based on their embedding norms, allowing the selection of a threshold without exposure to any test data and achieving competitive performance. Extensive experiments on several common benchmarks show that our proposed FastLogAD outperforms existing anomaly detection approaches. Furthermore, compared to previous methods, FastLogAD achieves at least x10 speed increase in anomaly detection over prior work. Our implementation is available at https://github.com/YifeiLin0226/FastLogAD.
Paper Structure (30 sections, 8 equations, 6 figures, 4 tables)

This paper contains 30 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Examples of normal and abnormal log sequences.
  • Figure 2: Illustration of the complete pipeline of the proposed log anomaly detection solution. Taking logs from the Hadoop Distributed File System (HDFS) dataset xu2009detecting as examples, log templates are extracted through log parsing, followed by the grouping into sequences based on the identifier block ids. A vocabulary is created to map the log events and special tokens (e.g., [cls], [mask]) to their corresponding unique indices during model training. Then the normal log training data is passed to the log anomaly detection module for model optimization. During inference, the vocabulary is static and used to construct query log sequences after log parsing and grouping. A yes/no answer is provided by the log anomaly detection model. The specific architecture of our log anomaly detection module is presented in Fig. 3 and Fig. 4.
  • Figure 3: The training training procedure of FastLogAD. For a given sequence of normal logs, we randomly mask the log tokens in a certain ratio, and then generate the corresponding log sequence through a generator. For the discriminator, we propose RTD and HST to learn to distinguish normal logs from pseudo-anomaly logs.
  • Figure 4: The inference procedure of FastLogAD. In inference, we directly use the anomaly discriminator for efficient diagnosis of logs.
  • Figure 5: Visualization of anomaly probability distributions on HDFS, BGL and Thunderbird datasets.
  • ...and 1 more figures