Table of Contents
Fetching ...

CyberCScope: Mining Skewed Tensor Streams and Online Anomaly Detection in Cybersecurity Systems

Kota Nakamura, Koki Kawabata, Shungo Tanaka, Yasuko Matsubara, Yasushi Sakurai

TL;DR

The paper tackles real-time anomaly detection in skewed, high-order tensor streams from cybersecurity systems that combine categorical and skewed continuous attributes. It introduces CyberCScope, a streaming framework built on the OP-SiFi decomposition to extract major trends across skewed infinite/finite spaces and to learn multiple time-evolving regimes. A regime-based compact description (C) and MDL-based model compression enable online anomaly scoring that adapts to regime switches, enabling detection of multiple intrusion types with interpretable signatures. Empirical results on large-scale CI'17 and CCI'18 datasets show superior accuracy (e.g., ROC-AUC and PR-AUC) and linear, real-time scalability compared to state-of-the-art baselines, demonstrating practical impact for proactive cybersecurity monitoring.

Abstract

Cybersecurity systems are continuously producing a huge number of time-stamped events in the form of high-order tensors, such as {count; time, port, flow duration, packet size, . . . }, and so how can we detect anomalies/intrusions in real time? How can we identify multiple types of intrusions and capture their characteristic behaviors? The tensor data consists of categorical and continuous attributes and the data distributions of continuous attributes typically exhibit skew. These data properties require handling skewed infinite and finite dimensional spaces simultaneously. In this paper, we propose a novel streaming method, namely CyberCScope. The method effectively decomposes incoming tensors into major trends while explicitly distinguishing between categorical and skewed continuous attributes. To our knowledge, it is the first to compute hybrid skewed infinite and finite dimensional decomposition. Based on this decomposition, it streamingly finds distinct time-evolving patterns, enabling the detection of multiple types of anomalies. Extensive experiments on large-scale real datasets demonstrate that CyberCScope detects various intrusions with higher accuracy than state-of-the-art baselines while providing meaningful summaries for the intrusions that occur in practice.

CyberCScope: Mining Skewed Tensor Streams and Online Anomaly Detection in Cybersecurity Systems

TL;DR

The paper tackles real-time anomaly detection in skewed, high-order tensor streams from cybersecurity systems that combine categorical and skewed continuous attributes. It introduces CyberCScope, a streaming framework built on the OP-SiFi decomposition to extract major trends across skewed infinite/finite spaces and to learn multiple time-evolving regimes. A regime-based compact description (C) and MDL-based model compression enable online anomaly scoring that adapts to regime switches, enabling detection of multiple intrusion types with interpretable signatures. Empirical results on large-scale CI'17 and CCI'18 datasets show superior accuracy (e.g., ROC-AUC and PR-AUC) and linear, real-time scalability compared to state-of-the-art baselines, demonstrating practical impact for proactive cybersecurity monitoring.

Abstract

Cybersecurity systems are continuously producing a huge number of time-stamped events in the form of high-order tensors, such as {count; time, port, flow duration, packet size, . . . }, and so how can we detect anomalies/intrusions in real time? How can we identify multiple types of intrusions and capture their characteristic behaviors? The tensor data consists of categorical and continuous attributes and the data distributions of continuous attributes typically exhibit skew. These data properties require handling skewed infinite and finite dimensional spaces simultaneously. In this paper, we propose a novel streaming method, namely CyberCScope. The method effectively decomposes incoming tensors into major trends while explicitly distinguishing between categorical and skewed continuous attributes. To our knowledge, it is the first to compute hybrid skewed infinite and finite dimensional decomposition. Based on this decomposition, it streamingly finds distinct time-evolving patterns, enabling the detection of multiple types of anomalies. Extensive experiments on large-scale real datasets demonstrate that CyberCScope detects various intrusions with higher accuracy than state-of-the-art baselines while providing meaningful summaries for the intrusions that occur in practice.

Paper Structure

This paper contains 9 sections, 2 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Data distribution of a continuous attribute is skewed: it exhibits right skewness, deviating from a Gaussian distribution based on the empirical mean and variance.
  • Figure 2: Overview of OP-SiFi decomposition.
  • Figure 3: Real-time intrusion detection of CyberCScope on CCI'18 dataset: the stars indicate intrusions. It successfully identified multiple types of intrusions (e.g., #2 and #5: FTP-BruteForce, #3: Dos Slowloris, and #8: Dos Hulk).
  • Figure 4: CyberCScope captures characteristic behavior of the Dos Slowloris: Component #10 shifts a larger value.

Theorems & Definitions (2)

  • Definition 1: Regime: $\theta$
  • Definition 2: Compact description