Table of Contents
Fetching ...

SDOoop: Capturing Periodical Patterns and Out-of-phase Anomalies in Streaming Data Analysis

Alexander Hartl, Félix Iglesias Vázquez, Tanja Zseby

TL;DR

SDOoop is presented, which extends the capabilities of SDO's streaming version to retain temporal information of data structures, and conforms to next-generation machine learning, which, in addition to accuracy and speed, is expected to provide highly interpretable and informative models.

Abstract

Streaming data analysis is increasingly required in applications, e.g., IoT, cybersecurity, robotics, mechatronics or cyber-physical systems. Despite its relevance, it is still an emerging field with open challenges. SDO is a recent anomaly detection method designed to meet requirements of speed, interpretability and intuitive parameterization. In this work, we present SDOoop, which extends the capabilities of SDO's streaming version to retain temporal information of data structures. SDOoop spots contextual anomalies undetectable by traditional algorithms, while enabling the inspection of data geometries, clusters and temporal patterns. We used SDOoop to model real network communications in critical infrastructures and extract patterns that disclose their dynamics. Moreover, we evaluated SDOoop with data from intrusion detection and natural science domains and obtained performances equivalent or superior to state-of-the-art approaches. Our results show the high potential of new model-based methods to analyze and explain streaming data. Since SDOoop operates with constant per-sample space and time complexity, it is ideal for big data, being able to instantly process large volumes of information. SDOoop conforms to next-generation machine learning, which, in addition to accuracy and speed, is expected to provide highly interpretable and informative models.

SDOoop: Capturing Periodical Patterns and Out-of-phase Anomalies in Streaming Data Analysis

TL;DR

SDOoop is presented, which extends the capabilities of SDO's streaming version to retain temporal information of data structures, and conforms to next-generation machine learning, which, in addition to accuracy and speed, is expected to provide highly interpretable and informative models.

Abstract

Streaming data analysis is increasingly required in applications, e.g., IoT, cybersecurity, robotics, mechatronics or cyber-physical systems. Despite its relevance, it is still an emerging field with open challenges. SDO is a recent anomaly detection method designed to meet requirements of speed, interpretability and intuitive parameterization. In this work, we present SDOoop, which extends the capabilities of SDO's streaming version to retain temporal information of data structures. SDOoop spots contextual anomalies undetectable by traditional algorithms, while enabling the inspection of data geometries, clusters and temporal patterns. We used SDOoop to model real network communications in critical infrastructures and extract patterns that disclose their dynamics. Moreover, we evaluated SDOoop with data from intrusion detection and natural science domains and obtained performances equivalent or superior to state-of-the-art approaches. Our results show the high potential of new model-based methods to analyze and explain streaming data. Since SDOoop operates with constant per-sample space and time complexity, it is ideal for big data, being able to instantly process large volumes of information. SDOoop conforms to next-generation machine learning, which, in addition to accuracy and speed, is expected to provide highly interpretable and informative models.
Paper Structure (15 sections, 3 theorems, 7 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 3 theorems, 7 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

For an observer $\boldsymbol{\omega} \in \Omega$, let $g(t)\in \mathbb R^+$ denote the expected rate of arriving data points, for which $\boldsymbol{\omega}$ is contained in $\mathcal{N}$ at time $t$. If $g(t)$ is a $T_0$-periodic function and $T\gg T_0$, observations $P_{\boldsymbol{\omega},n}$ app

Figures (7)

  • Figure 1: Example of a data stream, a model with two observers (red and orange), and three types of anomalies (blue): local (left), contextual (middle), global (right).
  • Figure 2: Normal outliers and contextual outliers (aka out-of-phase outliers) in synthetic data for a fraction of contextual outliers of 0.5%.
  • Figure 3: OD performance vs contextual outlier rate in the proof of concept.
  • Figure 4: Learned magnitude spectrum (left), one-hour temporal plots (middle) and 24-hour temporal plots (right) for four exemplary observers when processing network data captured in an e-charging infrastructure.
  • Figure 5: Outlier scores of network data from an e-charging infrastructure.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Definition 1
  • Lemma 1
  • Theorem 1
  • Proof 1
  • Theorem 2