Table of Contents
Fetching ...

Beyond Euclidean Summaries: Online Change Point Detection for Distribution-Valued Data

Yingyan Zeng, Yujing Huang, Xiaoyu Chen

TL;DR

This work proposes an intrinsic distribution-valued CPD framework that treats streaming batch data as a stochastic process on the 2-Wasserstein space, and detects changes in the law of this process by mapping each empirical distribution to a tangent space relative to a pre-change Fr\'echet barycenter, yielding a reference-centered local linearization of 2-Wasserstein space.

Abstract

Existing online change-point detection (CPD) methods rely on fixed-dimensional Euclidean summaries, implicitly assuming that distributional changes are well captured by moment-based or feature-based representations. They can obscure important changes in distributional shape or geometry. We propose an intrinsic distribution-valued CPD framework that treats streaming batch data as a stochastic process on the 2-Wasserstein space. Our method detects changes in the law of this process by mapping each empirical distribution to a tangent space relative to a pre-change Fréchet barycenter, yielding a reference-centered local linearization of 2-Wasserstein space. This representation enables sequential detectors by adapting classical multivariate monitoring statistics to tangent fields. We provide theoretical guarantees and demonstrate, via synthetic and real-world experiments, that our approach detects complex distributional shifts with reduced detection delay at matched $\mathrm{ARL}_0$ compared with moments-based and model-free baselines.

Beyond Euclidean Summaries: Online Change Point Detection for Distribution-Valued Data

TL;DR

This work proposes an intrinsic distribution-valued CPD framework that treats streaming batch data as a stochastic process on the 2-Wasserstein space, and detects changes in the law of this process by mapping each empirical distribution to a tangent space relative to a pre-change Fr\'echet barycenter, yielding a reference-centered local linearization of 2-Wasserstein space.

Abstract

Existing online change-point detection (CPD) methods rely on fixed-dimensional Euclidean summaries, implicitly assuming that distributional changes are well captured by moment-based or feature-based representations. They can obscure important changes in distributional shape or geometry. We propose an intrinsic distribution-valued CPD framework that treats streaming batch data as a stochastic process on the 2-Wasserstein space. Our method detects changes in the law of this process by mapping each empirical distribution to a tangent space relative to a pre-change Fréchet barycenter, yielding a reference-centered local linearization of 2-Wasserstein space. This representation enables sequential detectors by adapting classical multivariate monitoring statistics to tangent fields. We provide theoretical guarantees and demonstrate, via synthetic and real-world experiments, that our approach detects complex distributional shifts with reduced detection delay at matched compared with moments-based and model-free baselines.
Paper Structure (62 sections, 14 theorems, 67 equations, 10 figures, 13 tables, 1 algorithm)

This paper contains 62 sections, 14 theorems, 67 equations, 10 figures, 13 tables, 1 algorithm.

Key Result

Proposition 3.4

Assume $\bar{\mu}\in\mathcal{P}_2(\mathbb{R}^d)$ is absolutely continuous with respect to Lebesgue measure, and consider the quadratic cost $c(x,y)=\|x-y\|^2$. Then for any $\mu\in\mathcal{P}_2(\mathbb{R}^d)$ there exists a $\bar{\mu}$-a.e. unique optimal transport map $T_{\bar{\mu}}^{\mu}$ such tha we have the identity

Figures (10)

  • Figure 1: $\mathrm{ARL}_1$ vs. $\mathrm{ARL}_0$ comparison on synthetic continuous streams. IDD (red) shows superior detection speed.
  • Figure 2: Trade-off curves for discrete streams ($N=100$). IDD handles discrete geometry robustly.
  • Figure 3: AML Detection on FlowCAP-II. IDD (red diamond) achieves the best F1-score ($\approx 0.75$) compared to baselines, balancing precision and speed.
  • Figure 4: Reddit Sentiment Stream Monitoring. IDD alarms (red points) align with key news events (blue regions).
  • Figure 5: Multimodal Reweight ($\mathrm{ARL}_1$$\mathrm{ARL}_0$):Performance comparison across varying sample sizes $n \in \{50, 100, 300\}$ (rows) and dimensions $d \in \{1, 5, 10, 50\}$ (columns).
  • ...and 5 more figures

Theorems & Definitions (33)

  • Remark 3.3
  • Proposition 3.4: Radial Isometry, Theorem 8.6 Villani2009OT
  • Remark 3.5
  • Proposition 3.6: Barycentric projection variance decomposition
  • Remark 3.7
  • Proposition 3.9: Asymptotic Null Distribution
  • proof
  • Theorem 3.10: Sequential false-alarm control under fixed thresholds
  • Corollary 3.11: Empirical-quantile calibration implies ARL control
  • Proposition 3.13: Lipschitz Continuity of the Covariance Kernel
  • ...and 23 more