Table of Contents
Fetching ...

Detecting Change Intervals with Isolation Distributional Kernel

Yang Cao, Ye Zhu, Kai Ming Ting, Flora D. Salim, Hong Xian Li, Luxing Yang, Gang Li

TL;DR

This work reframes change-point detection as Change-Interval Detection (CID) and introduces iCID, a kernel-based method that leverages Isolation Distributional Kernel (IDK) to compare distributions of adjacent intervals. By exploiting IDK's finite, data-dependent feature map, iCID detects subtle and obvious distribution shifts with linear-time operations in both offline and online settings, and it includes an automatic, non-learning parameter selection mechanism. Extensive experiments on real-world and synthetic data show iCID achieving state-of-the-art or competitive F1-scores while offering substantial speed advantages over deep-learning baselines and robustness to outliers. The approach holds practical potential for real-time monitoring of large-scale streams, with flexibility to adapt parameter settings without costly optimization.

Abstract

Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitivity to outliers. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change-points in data streams with the tolerance of outliers. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.

Detecting Change Intervals with Isolation Distributional Kernel

TL;DR

This work reframes change-point detection as Change-Interval Detection (CID) and introduces iCID, a kernel-based method that leverages Isolation Distributional Kernel (IDK) to compare distributions of adjacent intervals. By exploiting IDK's finite, data-dependent feature map, iCID detects subtle and obvious distribution shifts with linear-time operations in both offline and online settings, and it includes an automatic, non-learning parameter selection mechanism. Extensive experiments on real-world and synthetic data show iCID achieving state-of-the-art or competitive F1-scores while offering substantial speed advantages over deep-learning baselines and robustness to outliers. The approach holds practical potential for real-time monitoring of large-scale streams, with flexibility to adapt parameter settings without costly optimization.

Abstract

Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitivity to outliers. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change-points in data streams with the tolerance of outliers. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
Paper Structure (30 sections, 10 equations, 19 figures, 4 tables, 2 algorithms)

This paper contains 30 sections, 10 equations, 19 figures, 4 tables, 2 algorithms.

Figures (19)

  • Figure 1: S1 dataset: Comparison of the same proposed distributional kernel-based CID algorithm using GDK vs IDK. (a) shows the data distribution. (b) and (c) plot the change-point score with different kernels. Red bars indicate ground-truth change-points. The variances of five blocks are $1.0, 2.2, 4.3, 48.3$ & $28.3$, respectively. There are 5 manually added outliers in the first two blocks.
  • Figure 2: MDS result based on the dissimilarity matrix of intervals of S1 dataset. The split intervals are shown as points.
  • Figure 3: Illustration of the similarity calculation using Isolation kernel ($\psi$ = 16): (a) An example partitioning $H$. (b) Contours with reference to point $x = (0.3,0.3)$.
  • Figure 4: Feature mapping of each time interval $X_i$ as a distribution to a point $\widehat{\Phi}(\mathcal{P}_{X_i})$ in the feature space of IDK or KME. Similar distributions are mapped into the same region of the feature space; and different distributions are mapped into different regions.
  • Figure 5: Illustration of iCID calculation.
  • ...and 14 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7