Table of Contents
Fetching ...

Optimal Online Change Detection via Random Fourier Features

Florian Kalinke, Shakeel Gavioli-Akilagun

TL;DR

This work addresses online non-parametric change point detection in multivariate data by integrating kernel mean embeddings with a random Fourier feature (RFF) approximation of the maximum mean discrepancy (MMD). The proposed Online RFF-MMD runs on a genuinely online, window-free basis, achieving $O(r \log n)$ time and $O(r \log n)$ space per observation through a dyadic grid of local tests and efficient feature-map updates. The authors establish strong theoretical guarantees, including average run length and false alarm controls, a concrete bound on detection delay, and a minimax lower bound showing near-optimality up to logarithmic factors, and they validate the method on synthetic data and MNIST with competitive performance. This approach enables scalable, non-parametric online change detection for high-volume streams without requiring pre-change data or window tuning, with practical impact for real-time monitoring tasks.

Abstract

This article studies the problem of online non-parametric change point detection in multivariate data streams. We approach the problem through the lens of kernel-based two-sample testing and introduce a sequential testing procedure based on random Fourier features, running with logarithmic time complexity per observation and with overall logarithmic space complexity. The algorithm has two advantages compared to the state of the art. First, our approach is genuinely online, and no access to training data known to be from the pre-change distribution is necessary. Second, the algorithm does not require the user to specify a window parameter over which local tests are to be calculated. We prove strong theoretical guarantees on the algorithm's performance, including information-theoretic bounds demonstrating that the detection delay is optimal in the minimax sense. Numerical studies on real and synthetic data show that our algorithm is competitive with respect to the state of the art.

Optimal Online Change Detection via Random Fourier Features

TL;DR

This work addresses online non-parametric change point detection in multivariate data by integrating kernel mean embeddings with a random Fourier feature (RFF) approximation of the maximum mean discrepancy (MMD). The proposed Online RFF-MMD runs on a genuinely online, window-free basis, achieving time and space per observation through a dyadic grid of local tests and efficient feature-map updates. The authors establish strong theoretical guarantees, including average run length and false alarm controls, a concrete bound on detection delay, and a minimax lower bound showing near-optimality up to logarithmic factors, and they validate the method on synthetic data and MNIST with competitive performance. This approach enables scalable, non-parametric online change detection for high-volume streams without requiring pre-change data or window tuning, with practical impact for real-time monitoring tasks.

Abstract

This article studies the problem of online non-parametric change point detection in multivariate data streams. We approach the problem through the lens of kernel-based two-sample testing and introduce a sequential testing procedure based on random Fourier features, running with logarithmic time complexity per observation and with overall logarithmic space complexity. The algorithm has two advantages compared to the state of the art. First, our approach is genuinely online, and no access to training data known to be from the pre-change distribution is necessary. Second, the algorithm does not require the user to specify a window parameter over which local tests are to be calculated. We prove strong theoretical guarantees on the algorithm's performance, including information-theoretic bounds demonstrating that the detection delay is optimal in the minimax sense. Numerical studies on real and synthetic data show that our algorithm is competitive with respect to the state of the art.

Paper Structure

This paper contains 34 sections, 15 theorems, 90 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $N$ be the extended stopping time defined via (equation: MMD stop time). For any $\gamma > 1$, if the sequence of thresholds satisfies $\lambda_n \geq \sqrt{2} + \sqrt{2 \log \left ( 4 \gamma \log_2 \left ( 2 \gamma \right ) \right )}$ for all $n \in \mathbb{N}$, it holds that $\mathbb{E}_\infty

Figures (8)

  • Figure 1: Schematic representation of the proposed algorithm upon observing the first $n=6$ elements. Merging equal sized "windows" yields the division along dyadic points.
  • Figure 2: Average runtime ($10$ repetitions) of RFF-MMD per insert operation (left) and total (right).
  • Figure 3: Average detection delay from $\mathbb{P} = \mathcal{N}(\bm 0_{20},\mathbf I_{20})$ to the $\mathbb{Q}$ indicated on top ($d=20$, $\sigma = 2$).
  • Figure 4: Average detection delay from MNIST digit 0 to digits $1$, $2$, and $3$ (left to right).
  • Figure 5: Average detection delay from MNIST digit 0 to digits 4--9 (left to right).
  • ...and 3 more figures

Theorems & Definitions (26)

  • Example 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Lemma 1
  • Corollary 1
  • Corollary 2
  • Theorem 5
  • proof
  • ...and 16 more