Table of Contents
Fetching ...

Fast Approximations and Coresets for (k, l)-Median under Dynamic Time Warping

Jacobus Conradi, Benedikt Kolbe, Ioannis Psarros, Dennis Rohde

TL;DR

It is observed that given $n$ curves under DTW, one can directly construct a metric that approximates DTW on this set, permitting the use of the wealth of results on metric spaces for clustering purposes.

Abstract

We present algorithms for the computation of $\varepsilon$-coresets for $k$-median clustering of point sequences in $\mathbb{R}^d$ under the $p$-dynamic time warping (DTW) distance. Coresets under DTW have not been investigated before, and the analysis is not directly accessible to existing methods as DTW is not a metric. The three main ingredients that allow our construction of coresets are the adaptation of the $\varepsilon$-coreset framework of sensitivity sampling, bounds on the VC dimension of approximations to the range spaces of balls under DTW, and new approximation algorithms for the $k$-median problem under DTW. We achieve our results by investigating approximations of DTW that provide a trade-off between the provided accuracy and amenability to known techniques. In particular, we observe that given $n$ curves under DTW, one can directly construct a metric that approximates DTW on this set, permitting the use of the wealth of results on metric spaces for clustering purposes. The resulting approximations are the first with polynomial running time and achieve a very similar approximation factor as state-of-the-art techniques. We apply our results to produce a practical algorithm approximating $(k,\ell)$-median clustering under DTW.

Fast Approximations and Coresets for (k, l)-Median under Dynamic Time Warping

TL;DR

It is observed that given curves under DTW, one can directly construct a metric that approximates DTW on this set, permitting the use of the wealth of results on metric spaces for clustering purposes.

Abstract

We present algorithms for the computation of -coresets for -median clustering of point sequences in under the -dynamic time warping (DTW) distance. Coresets under DTW have not been investigated before, and the analysis is not directly accessible to existing methods as DTW is not a metric. The three main ingredients that allow our construction of coresets are the adaptation of the -coreset framework of sensitivity sampling, bounds on the VC dimension of approximations to the range spaces of balls under DTW, and new approximation algorithms for the -median problem under DTW. We achieve our results by investigating approximations of DTW that provide a trade-off between the provided accuracy and amenability to known techniques. In particular, we observe that given curves under DTW, one can directly construct a metric that approximates DTW on this set, permitting the use of the wealth of results on metric spaces for clustering purposes. The resulting approximations are the first with polynomial running time and achieve a very similar approximation factor as state-of-the-art techniques. We apply our results to produce a practical algorithm approximating -median clustering under DTW.
Paper Structure (16 sections, 39 theorems, 22 equations, 8 figures, 2 algorithms)

This paper contains 16 sections, 39 theorems, 22 equations, 8 figures, 2 algorithms.

Key Result

Theorem 7

Let $F$ be a class of maps from ${\mathbb R}^{s} \times X$ to ${\mathbb R}$, so that for all $x \in X$ and $f \in F$, the function $\alpha\mapsto f( \alpha , x )$ is a polynomial on ${\mathbb R}^s$ of degree $\delta$. Let $H$ be a $\kappa$-combination of $\mathop{\mathrm{\mathrm{sign}}}\nolimits(F)

Figures (8)

  • Figure 1: Example of a traversal between the red and blue curve realizing the dynamic time warping distance. The sum of the black distances is minimized.
  • Figure 2: Illustration of a coreset (red), i.e. a weighted sparse representation of the original set of curves (in red and black). The weights in this case are $w(X_1)=3$, $w(X_2)=2$ and $w(X_3)=1$.
  • Figure 3: Violated triangle inequality as $\mathop{\mathrm{dtw}}\nolimits(s,t)\approx12$, but $\mathop{\mathrm{dtw}}\nolimits(s,x)\approx 0$ (matching in blue), $\mathop{\mathrm{dtw}}\nolimits(y,t)\approx 0$ (red matching) and $\mathop{\mathrm{dtw}}\nolimits(x,y)\approx 3$ (green matching).
  • Figure 4: Illustration of how the optimal traversals $W_{sx}$, $W_{xy}$ and $W_{yt}$ of visited curves can be 'composed' to yield a set $W$ that induces a traversal $\widetilde{W}$ (in red) of $s$ and $t$. Any single matched pair of vertices in $W_{sx}$, $W_{xy}$ or $W_{yt}$ is at most $|W|\leq\ell+\ell'$ times a part of $W$.
  • Figure 5: Illustration of the metric closure. On the left a distance function on five points represented as a graph. In the middle the shortest path tree rooted at $x$ inducing all values of the metric closure of the distance function from some element to $x$. On the right the metric closure.
  • ...and 3 more figures

Theorems & Definitions (50)

  • Definition 1: $p$-Dynamic Time Warping
  • Definition 2: Problem definition
  • Definition 3: $\varepsilon$-coreset
  • Definition 4: $(\alpha,\beta)$-approximation
  • Definition 5: $(1+\varepsilon)$-approximate $\ell$-simplifications
  • Definition 6: AB99
  • Theorem 7: Theorem 8.3 AB99
  • Lemma 7
  • Lemma 7
  • Definition 8
  • ...and 40 more