Table of Contents
Fetching ...

LogSHIELD: A Graph-based Real-time Anomaly Detection Framework using Frequency Analysis

Krishna Chandra Roy, Qian Chen

TL;DR

This work proposes LogSHIELD, a highly effective graph-based anomaly detection model in host data that can detect stealthy and sophisticated attacks with over 98% average AUC and F1 scores, and outperforms state-of-the-art models in detection time.

Abstract

Anomaly-based cyber threat detection using deep learning is on a constant growth in popularity for novel cyber-attack detection and forensics. A robust, efficient, and real-time threat detector in a large-scale operational enterprise network requires high accuracy, high fidelity, and a high throughput model to detect malicious activities. Traditional anomaly-based detection models, however, suffer from high computational overhead and low detection accuracy, making them unsuitable for real-time threat detection. In this work, we propose LogSHIELD, a highly effective graph-based anomaly detection model in host data. We present a real-time threat detection approach using frequency-domain analysis of provenance graphs. To demonstrate the significance of graph-based frequency analysis we proposed two approaches. Approach-I uses a Graph Neural Network (GNN) LogGNN and approach-II performs frequency domain analysis on graph node samples for graph embedding. Both approaches use a statistical clustering algorithm for anomaly detection. The proposed models are evaluated using a large host log dataset consisting of 774M benign logs and 375K malware logs. LogSHIELD explores the provenance graph to extract contextual and causal relationships among logs, exposing abnormal activities. It can detect stealthy and sophisticated attacks with over 98% average AUC and F1 scores. It significantly improves throughput, achieves an average detection latency of 0.13 seconds, and outperforms state-of-the-art models in detection time.

LogSHIELD: A Graph-based Real-time Anomaly Detection Framework using Frequency Analysis

TL;DR

This work proposes LogSHIELD, a highly effective graph-based anomaly detection model in host data that can detect stealthy and sophisticated attacks with over 98% average AUC and F1 scores, and outperforms state-of-the-art models in detection time.

Abstract

Anomaly-based cyber threat detection using deep learning is on a constant growth in popularity for novel cyber-attack detection and forensics. A robust, efficient, and real-time threat detector in a large-scale operational enterprise network requires high accuracy, high fidelity, and a high throughput model to detect malicious activities. Traditional anomaly-based detection models, however, suffer from high computational overhead and low detection accuracy, making them unsuitable for real-time threat detection. In this work, we propose LogSHIELD, a highly effective graph-based anomaly detection model in host data. We present a real-time threat detection approach using frequency-domain analysis of provenance graphs. To demonstrate the significance of graph-based frequency analysis we proposed two approaches. Approach-I uses a Graph Neural Network (GNN) LogGNN and approach-II performs frequency domain analysis on graph node samples for graph embedding. Both approaches use a statistical clustering algorithm for anomaly detection. The proposed models are evaluated using a large host log dataset consisting of 774M benign logs and 375K malware logs. LogSHIELD explores the provenance graph to extract contextual and causal relationships among logs, exposing abnormal activities. It can detect stealthy and sophisticated attacks with over 98% average AUC and F1 scores. It significantly improves throughput, achieves an average detection latency of 0.13 seconds, and outperforms state-of-the-art models in detection time.

Paper Structure

This paper contains 26 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Approach-I: System architecture of LogSHIELD with LogGNN graph embedding.
  • Figure 2: A host log example for EventID 4634. Red boxes are event fields and green boxes are values of the corresponding fields.
  • Figure 3: A provenance graph constructed using a small fraction of daily benign logs of User P1, P2, and P3.
  • Figure 4: Log parsing workflow for approach-I. A random set of log event fields is shown in the parsing workflow.
  • Figure 5: Approach-II: System Architecture of LogSHIELD with frequency domain analysis (FDA).
  • ...and 1 more figures