Table of Contents
Fetching ...

Raising the ClaSS of Streaming Time Series Segmentation

Arik Ermshaus, Patrick Schäfer, Ulf Leser

TL;DR

ClaSS addresses streaming time series segmentation by casting CP detection as a self-supervised binary classification task over sliding windows, scoring hypothetical splits with a Classification Score Profile and testing for significance. It extends the batch ClaSP approach to streaming via an exact streaming TS k-NN with O(k·d) per update and a cross-validation routine that runs in O(d), achieving an overall O(n·d) time and O(d) space. Experiments on two large benchmarks and six data archives show ClaSS significantly outperforms eight competitors in accuracy, while maintaining linear scaling in the sliding window size and delivering practical throughput on Apache Flink. The work provides open-source code, a standalone Python implementation, and a Flink window operator to support reproducibility and real-time deployment.

Abstract

Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 1k data points per second for the Apache Flink streaming engine.

Raising the ClaSS of Streaming Time Series Segmentation

TL;DR

ClaSS addresses streaming time series segmentation by casting CP detection as a self-supervised binary classification task over sliding windows, scoring hypothetical splits with a Classification Score Profile and testing for significance. It extends the batch ClaSP approach to streaming via an exact streaming TS k-NN with O(k·d) per update and a cross-validation routine that runs in O(d), achieving an overall O(n·d) time and O(d) space. Experiments on two large benchmarks and six data archives show ClaSS significantly outperforms eight competitors in accuracy, while maintaining linear scaling in the sliding window size and delivering practical throughput on Apache Flink. The work provides open-source code, a standalone Python implementation, and a Flink window operator to support reproducibility and real-time deployment.

Abstract

Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 1k data points per second for the Apache Flink streaming engine.
Paper Structure (31 sections, 3 equations, 9 figures, 3 tables, 3 algorithms)

This paper contains 31 sections, 3 equations, 9 figures, 3 tables, 3 algorithms.

Figures (9)

  • Figure 1: An electrocardiogram (ECG) recording of a human subject demonstrating the transition from normal heartbeats (in blue) to ventricular fibrillations (in orange) Nolle1986crei. The ClaSS algorithm continuously scores the TS stream within a sliding window (shown in red), and at $t = 11.2k$ a significant change in the signal shape is detected and immediately reported to the user. This split effectively divides the stream into a fully processed segment and one that evolves.
  • Figure 2: A TS stream $S$ from which the last $d=10k$ observations are buffered in a sliding window $T = S_{\tau-d+1,\tau}$, depicted as the red frame. Older (or yet to arrive) data points are greyed out. The sliding window is further cut into subsequences of width $w = 200$, to be analysed for segmentation.
  • Figure 3: The conceptual ClaSS workflow for a human respiration recording that captures the transition from a neutral to an excited state Schmidt2018IntroducingWA. (a) The streaming $k$-NN classifier in ClaSS is updated with the newest subsequence (magenta). (b) For every possible offset, the sliding window (red) is transformed into hypothetical binary classification problems evaluated using cross-validation. (c) The result, ClaSP, annotates the sliding window.
  • Figure 4: A workflow example for the $k$-NN classifier update and cross-validation computation in ClaSS. The TS stream contains the beginning of the 2011 Tōhoku earthquake seismogram, captured at Black Forest Observatory Beyreuther2010ObsPyAP. (a) The streaming 3-NN updates its means, standard deviations, and dot products to calculate the correlations between the latest subsequence (magenta) and the previous ones. (b) The 3-NN correlations and offsets are updated with the three highest correlations and their locations. (c) The sliding window is repeatedly divided into hypothetical splits and the updated $k$-NN classifier is evaluated to calculate the resulting classification scores (d) that form the ClaSP.
  • Figure 5: Covering segmentation ranks (top) and box plots (bottom) on the $107$ benchmark (left) and $485$ archive (right) TS for ClaSS (lowest rank) and the $8$ competitors.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6