Table of Contents
Fetching ...

Wasserstein-based identification of metastable states in time series data via change point detection and segment clustering

David Gentile, Joshua Huang, James M. Murphy

TL;DR

It is found that segmenting these time series via change points obtained by estimating the Wasserstein metric derivative and then clustering the identified segments as measures with similarity measured by the Wasserstein metric, successfully identifies metastable states in the law of the processes.

Abstract

Change point detection for time series analysis is a difficult and important problem in applied statistics, for which a variety of approaches have been developed in the past several decades. Here, the Wasserstein metric is employed as a tool for change-point identification in multi-dimensional time series data in order to identify clusters in time series in an unsupervised way. We leverage the simplicity of the optimal transport cost in the 1-dimensional setting to quickly identify both a segmentation (family of change points for a trajectory) and a clustering for the data when the number of segments is much smaller than the number of data points, making no parametric assumptions about the particular distributions involved. Our change point detection method scales linearly in the size of the data and in the dimension of the samples. We test our approach on idealized synthetic data trajectories, as well as real world trajectories coming from the domain of molecular dynamics simulations and underwater acoustics. We find that segmenting these time series via change points obtained by estimating the Wasserstein metric derivative and then clustering the identified segments as measures with similarity measured by the Wasserstein metric, successfully identifies metastable states in the law of the processes.

Wasserstein-based identification of metastable states in time series data via change point detection and segment clustering

TL;DR

It is found that segmenting these time series via change points obtained by estimating the Wasserstein metric derivative and then clustering the identified segments as measures with similarity measured by the Wasserstein metric, successfully identifies metastable states in the law of the processes.

Abstract

Change point detection for time series analysis is a difficult and important problem in applied statistics, for which a variety of approaches have been developed in the past several decades. Here, the Wasserstein metric is employed as a tool for change-point identification in multi-dimensional time series data in order to identify clusters in time series in an unsupervised way. We leverage the simplicity of the optimal transport cost in the 1-dimensional setting to quickly identify both a segmentation (family of change points for a trajectory) and a clustering for the data when the number of segments is much smaller than the number of data points, making no parametric assumptions about the particular distributions involved. Our change point detection method scales linearly in the size of the data and in the dimension of the samples. We test our approach on idealized synthetic data trajectories, as well as real world trajectories coming from the domain of molecular dynamics simulations and underwater acoustics. We find that segmenting these time series via change points obtained by estimating the Wasserstein metric derivative and then clustering the identified segments as measures with similarity measured by the Wasserstein metric, successfully identifies metastable states in the law of the processes.
Paper Structure (17 sections, 1 theorem, 13 equations, 25 figures, 3 algorithms)

This paper contains 17 sections, 1 theorem, 13 equations, 25 figures, 3 algorithms.

Key Result

Proposition 1

For fixed $w$ and $q$, for inputs $\{\vec{X_{t}}\}_{t=1}^T \subset \mathbb{R}^D$, the runtime complexity of Algorithm 1 is $\mathcal{O}(TD).$ Fix $w \in \{1, \dots \lfloor T/2\rfloor\}, q \in (0,1)$, and assume that the number of segments $N$ scales proportionally to the length of the input data $X

Figures (25)

  • Figure 1: The simplest possible instance of a change point: an abrupt change in the law occurs at time t=5000. The change in the law of the process is reflected in the Wasserstein metric derivative. Top: an artificial time series, with first 5000 samples drawn from a $\mathcal{N}(0, 0.85)$ distribution and the second 5000 samples drawn from a $\mathcal{N}(1, 0.75)$ distribution. Bottom: approximated metric Wasserstein derivative of the timeseries, with extremum appearing at $t=5000$, corresponding to the abrupt change in the distribution.
  • Figure 2: A curve on a manifold; the metric derivative is defined in terms of a limit of a difference quotient where the numerator is given by the difference between nearby points.
  • Figure 3: Pipeline of the procedure: first, the time series is split into a family of non-overlapping segments, then a pairwise similarity matrix is computed based on the Wasserstein distance, and finally a clustering algorithm is applied to the similarity matrix to learn labels for the segments and by extension the points of the timeseries.
  • Figure 4: Table of precision and recall scores for the toy data set. Precision and recall are averaged over tolerances of $tol=0,\dots, 100$.
  • Figure 5: Detecting change points in an idealized setting. We generate a synthetic data set by sampling from a fixed Laplace distribution with mean $100$ and variance $20$, and one with mean $200$ and variance $20$ in an alternating pattern. We simulate transitions between the two states by sampling from the Wasserstein geodesic (McCann interpolant) between the two states; change points are superimposed on the trajectory as red dashed lines.
  • ...and 20 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof