Raising the ClaSS of Streaming Time Series Segmentation
Arik Ermshaus, Patrick Schäfer, Ulf Leser
TL;DR
ClaSS addresses streaming time series segmentation by casting CP detection as a self-supervised binary classification task over sliding windows, scoring hypothetical splits with a Classification Score Profile and testing for significance. It extends the batch ClaSP approach to streaming via an exact streaming TS k-NN with O(k·d) per update and a cross-validation routine that runs in O(d), achieving an overall O(n·d) time and O(d) space. Experiments on two large benchmarks and six data archives show ClaSS significantly outperforms eight competitors in accuracy, while maintaining linear scaling in the sliding window size and delivering practical throughput on Apache Flink. The work provides open-source code, a standalone Python implementation, and a Flink window operator to support reproducibility and real-time deployment.
Abstract
Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 1k data points per second for the Apache Flink streaming engine.
