Table of Contents
Fetching ...

On High-Dimensional Change-Point Detection Based on Pairwise Distances

Spandan Ghoshal, Bilol Banerjee, Anil K. Ghosh

TL;DR

This paper proposes nonparametric, distance-based change-point detection methods that remain effective when the data dimension $d$ greatly exceeds the sample size. By leveraging pairwise distances and an energy-distance-inspired divergence, the authors develop a scalable statistic with a permutation-based significance test, and they extend the framework to generalized distance functionals $\varphi_{h,\psi}$. Theoretical results establish strong consistency and high-dimensional limits under HDLSS and growing-$n$ regimes, with detailed analyses of sparse signals and various distance choices. Empirical studies on simulated HDLSS data and real stock-price returns demonstrate the methods' competitive performance, particularly in detecting scale changes and higher-order distributional differences where Euclidean-distance methods falter. Overall, the work advances robust, nonparametric change-point detection for high-dimensional applications and suggests practical enhancements like block-distance variants for further resilience.

Abstract

In change-point analysis, one aims at finding the locations of abrupt distributional changes (if any) in a sequence of multivariate observations. In this article, we propose some nonparametric methods based on averages of pairwise distances for this purpose. These distance-based methods can be conveniently used for high-dimensional data even when the dimension is much larger than the sample size (i.e., the length of the sequence). We carry out some theoretical investigations on the behaviour of these methods not only when the dimension of the data remains fixed and the sample size grows to infinity, but also in situations where the dimension diverges to infinity while the sample size may or may not grow with the dimension. Several high-dimensional datasets are analyzed to compare the empirical performance of these proposed methods against some state-of-the-art methods.

On High-Dimensional Change-Point Detection Based on Pairwise Distances

TL;DR

This paper proposes nonparametric, distance-based change-point detection methods that remain effective when the data dimension greatly exceeds the sample size. By leveraging pairwise distances and an energy-distance-inspired divergence, the authors develop a scalable statistic with a permutation-based significance test, and they extend the framework to generalized distance functionals . Theoretical results establish strong consistency and high-dimensional limits under HDLSS and growing- regimes, with detailed analyses of sparse signals and various distance choices. Empirical studies on simulated HDLSS data and real stock-price returns demonstrate the methods' competitive performance, particularly in detecting scale changes and higher-order distributional differences where Euclidean-distance methods falter. Overall, the work advances robust, nonparametric change-point detection for high-dimensional applications and suggests practical enhancements like block-distance variants for further resilience.

Abstract

In change-point analysis, one aims at finding the locations of abrupt distributional changes (if any) in a sequence of multivariate observations. In this article, we propose some nonparametric methods based on averages of pairwise distances for this purpose. These distance-based methods can be conveniently used for high-dimensional data even when the dimension is much larger than the sample size (i.e., the length of the sequence). We carry out some theoretical investigations on the behaviour of these methods not only when the dimension of the data remains fixed and the sample size grows to infinity, but also in situations where the dimension diverges to infinity while the sample size may or may not grow with the dimension. Several high-dimensional datasets are analyzed to compare the empirical performance of these proposed methods against some state-of-the-art methods.

Paper Structure

This paper contains 11 sections, 9 theorems, 186 equations, 8 figures, 1 table.

Key Result

Lemma 1

If $\mathbf{X}_1,\mathbf{X}_2\overset{iid}{\sim}\mathrm{F}_{1}$ and $\mathbf{Y}_1,\mathbf{Y}_2\overset{iid}{\sim}\mathrm{F}_{2}$ are independent random vectors with $\mathbb{E}\left\Vert \mathbf{X} \right\Vert+\mathbb{E}\left\Vert \mathbf{Y} \right\Vert<\infty$, then $D(F_1,F_2)=0$ if and only if $\

Figures (8)

  • Figure 1: Frequency distributions of potential (grey bars) and detected change-points (black bars) in Examples \ref{['exa:1']} (top row) and \ref{['exa:2']} (bottom row).
  • Figure 2: Frequency distribution of potential change-points (grey bars) and detected change-points (black bars) by the proposed method in Examples 1 and 2. The dashed line represents the true change-point $\tau = 25$.
  • Figure 3: Frequency distributions of potential (grey bar) and detected (black bar) change-points in Examples \ref{['exa:3']} and \ref{['exa:4']}
  • Figure 4: Frequency distribution of the estimated change-point locations for Examples \ref{['exa:3']} and \ref{['exa:4']} when $h(t)=t,\;\psi(t)=1-\exp(-t/2)$.
  • Figure 5: Success rates of BG-$\ell_2$ (green curves), BG-$\ell_1$ (red curves) and BG-exp (blue curves) in Examples 5 and 6 when $\beta=0.6$ (dotted curves) and $\beta=0.4$ (solid curves)
  • ...and 3 more figures

Theorems & Definitions (36)

  • Example 1
  • Example 2
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Example 3
  • Example 4
  • Lemma 2
  • Theorem 3
  • Theorem 4
  • ...and 26 more