Distributed Log-driven Anomaly Detection System based on Evolving Decision Making
Zhuoran Tan, Qiyuan Wang, Christos Anagnostopoulos, Shameem P. Parambath, Jeremy Singer, Sam Temple
TL;DR
CEDLog tackles scalable log anomaly detection by integrating distributed processing (Airflow and Dask) with continual learning (Elastic Weight Consolidation) and a HITL feedback loop. It employs a dual-model detector (MLP for core features and GCN for ParameterList graphs) with a fusion stage that computes $F = P(p_1 = 0)\, s_0 + P(p_2 = 0)\, s_1$ and a threshold to produce predictions, aiming to improve precision while minimizing false positives. Logs are parsed with Drain, features are engineered via Random Forest selection and semantic embeddings, and a parallel processing pipeline accelerates training and inference, all deployed in Docker for offline training and online inference. Evaluation on BGL and HDFS demonstrates high precision and low false positives, with EWC reducing catastrophic forgetting during continual updates and scalable deployment enhancing applicability across multiple clients.
Abstract
Effective anomaly detection from logs is crucial for enhancing cybersecurity defenses by enabling the early identification of threats. Despite advances in anomaly detection, existing systems often fall short in areas such as post-detection validation, scalability, and effective maintenance. These limitations not only hinder the detection of new threats but also impair overall system performance. To address these challenges, we propose CEDLog, a novel practical framework that integrates Elastic Weight Consolidation (EWC) for continual learning and implements distributed computing for scalable processing by integrating Apache Airflow and Dask. In CEDLog, anomalies are detected through the synthesis of Multi-layer Perceptron (MLP) and Graph Convolutional Networks (GCNs) using critical features present in event logs. Through comparisons with update strategies on large-scale datasets, we demonstrate the strengths of CEDLog, showcasing efficient updates and low false positives
