LogELECTRA: Self-supervised Anomaly Detection for Unstructured Logs
Yuuki Yamanaka, Tomokatsu Takahashi, Takuya Minami, Yoshiaki Nakajima
TL;DR
LogELECTRA presents a parser-free, self-supervised approach to log anomaly detection by applying ELECTRA's Replaced Token Detection to individual log lines, enabling immediate per-line detection of point anomalies. By training on normal logs to learn contextual token usage and scoring lines via token-replacement likelihood, it avoids detection delays inherent in time-series anomaly methods. Evaluations on BGL, Spirit, and Thunderbird show competitive performance against supervised methods and strong superiority over unsupervised baselines, with robustness to unseen templates and data-grouping variations. This approach offers practical benefits for real-time monitoring of large-scale, unstructured logs and opens avenues for scalability, transferability, and explicit delay measurement in deployment scenarios.
Abstract
System logs are some of the most important information for the maintenance of software systems, which have become larger and more complex in recent years. The goal of log-based anomaly detection is to automatically detect system anomalies by analyzing the large number of logs generated in a short period of time, which is a critical challenge in the real world. Previous studies have used a log parser to extract templates from unstructured log data and detect anomalies on the basis of patterns of the template occurrences. These methods have limitations for logs with unknown templates. Furthermore, since most log anomalies are known to be point anomalies rather than contextual anomalies, detection methods based on occurrence patterns can cause unnecessary delays in detection. In this paper, we propose LogELECTRA, a new log anomaly detection model that analyzes a single line of log messages more deeply on the basis of self-supervised anomaly detection. LogELECTRA specializes in detecting log anomalies as point anomalies by applying ELECTRA, a natural language processing model, to analyze the semantics of a single line of log messages. LogELECTRA outperformed existing state-of-the-art methods in experiments on the public benchmark log datasets BGL, Sprit, and Thunderbird.
