Table of Contents
Fetching ...

On the number of iterations of the DBA algorithm

Frederik Brüning, Anne Driemel, Alperen Ergür, Heiko Röglin

TL;DR

This paper analyzes the iteration complexity of the DTW Barycenter Averaging (DBA) algorithm, which seeks a mean time series by minimizing the sum of DTW distances. It proves an exponential worst-case bound in the output length $k$, even for $n=2$, and establishes a polynomial-smoothed upper bound under Gaussian perturbations, highlighting a gap between theory and practice. A matching exponential lower bound is shown via a construction based on Vattani's $k$-means lower-bound framework, while experiments on the M5 dataset indicate much smaller, sublinear iteration growth in real data. The work also adapts techniques from $k$-means analysis, revealing both the potential and limitations of applying those methods to DBA due to non-monotonic DTW behavior, and it opens avenues for refined worst-case and smoothed analyses that better reflect practical performance.

Abstract

The DTW Barycenter Averaging (DBA) algorithm is a widely used algorithm for estimating the mean of a given set of point sequences. In this context, the mean is defined as a point sequence that minimises the sum of dynamic time warping distances (DTW). The algorithm is similar to the $k$-means algorithm in the sense that it alternately repeats two steps: (1) computing an optimal assignment to the points of the current mean, and (2) computing an optimal mean under the current assignment. The popularity of DBA can be attributed to the fact that it works well in practice, despite any theoretical guarantees to be known. In our paper, we aim to initiate a theoretical study of the number of iterations that DBA performs until convergence. We assume the algorithm is given $n$ sequences of $m$ points in $\mathbb{R}^d$ and a parameter $k$ that specifies the length of the mean sequence to be computed. We show that, in contrast to its fast running time in practice, the number of iterations can be exponential in $k$ in the worst case - even if the number of input sequences is $n=2$. We complement these findings with experiments on real-world data that suggest this worst-case behaviour is likely degenerate. To better understand the performance of the algorithm on non-degenerate input, we study DBA in the model of smoothed analysis, upper-bounding the expected number of iterations in the worst case under random perturbations of the input. Our smoothed upper bound is polynomial in $k$, $n$ and $d$, and for constant $n$, it is also polynomial in $m$. For our analysis, we adapt the set of techniques that were developed for analysing $k$-means and observe that this set of techniques is not sufficient to obtain tight bounds for general $n$.

On the number of iterations of the DBA algorithm

TL;DR

This paper analyzes the iteration complexity of the DTW Barycenter Averaging (DBA) algorithm, which seeks a mean time series by minimizing the sum of DTW distances. It proves an exponential worst-case bound in the output length , even for , and establishes a polynomial-smoothed upper bound under Gaussian perturbations, highlighting a gap between theory and practice. A matching exponential lower bound is shown via a construction based on Vattani's -means lower-bound framework, while experiments on the M5 dataset indicate much smaller, sublinear iteration growth in real data. The work also adapts techniques from -means analysis, revealing both the potential and limitations of applying those methods to DBA due to non-monotonic DTW behavior, and it opens avenues for refined worst-case and smoothed analyses that better reflect practical performance.

Abstract

The DTW Barycenter Averaging (DBA) algorithm is a widely used algorithm for estimating the mean of a given set of point sequences. In this context, the mean is defined as a point sequence that minimises the sum of dynamic time warping distances (DTW). The algorithm is similar to the -means algorithm in the sense that it alternately repeats two steps: (1) computing an optimal assignment to the points of the current mean, and (2) computing an optimal mean under the current assignment. The popularity of DBA can be attributed to the fact that it works well in practice, despite any theoretical guarantees to be known. In our paper, we aim to initiate a theoretical study of the number of iterations that DBA performs until convergence. We assume the algorithm is given sequences of points in and a parameter that specifies the length of the mean sequence to be computed. We show that, in contrast to its fast running time in practice, the number of iterations can be exponential in in the worst case - even if the number of input sequences is . We complement these findings with experiments on real-world data that suggest this worst-case behaviour is likely degenerate. To better understand the performance of the algorithm on non-degenerate input, we study DBA in the model of smoothed analysis, upper-bounding the expected number of iterations in the worst case under random perturbations of the input. Our smoothed upper bound is polynomial in , and , and for constant , it is also polynomial in . For our analysis, we adapt the set of techniques that were developed for analysing -means and observe that this set of techniques is not sufficient to obtain tight bounds for general .
Paper Structure (24 sections, 12 theorems, 16 equations, 9 figures, 12 tables)

This paper contains 24 sections, 12 theorems, 16 equations, 9 figures, 12 tables.

Key Result

Theorem 2

Let $Q_1,\ldots,Q_N$ be quadratic polynomials with $s$ variables. The number of semi-algebraically connected components of realizable sign conditions of $Q_1,\ldots,Q_N$ on $\mathbb{R}^s$ is $O\left( (2N)^{s} \right)$.

Figures (9)

  • Figure 1: An arrangement of curves in the plane with the respective sign vectors.
  • Figure 2: Schematic drawing of gadget $G_i$ ($i\geq1$) in $k$-means instance (Left) and in DBA instance (Right).
  • Figure 3: Depiction of the average number of iterations of the DBA algorithm with respect to the number of time series in the chosen sets. Each function graph corresponds to a fixed length of all time series in the sets. The left graphic uses normal scales and the right graphic uses logarithmic scales on both axes.
  • Figure 4: Depiction of the average number of iterations of the DBA algorithm with respect to the length of the series in the chosen sets. Each function graph corresponds to one product department. The left graphic uses normal scales and the right graphic uses logarithmic scales on both axes.
  • Figure 5: Depiction of the average number of iterations of the DBA algorithm with respect to the length of the center point sequence. Each function graph corresponds to one product department. The left graphic uses normal scales and the right graphic uses logarithmic scales on both axes.
  • ...and 4 more figures

Theorems & Definitions (14)

  • Definition 1
  • Theorem 2
  • Lemma 3
  • Lemma 4
  • Theorem 6
  • Lemma 7
  • Theorem 8
  • Theorem 9
  • Lemma 10
  • Lemma 11
  • ...and 4 more