Table of Contents
Fetching ...

Fast dynamic time warping and clustering in C++

Volkan Kumtepeli, Rebecca Perriment, David A. Howey

TL;DR

The MIP clustering is most effective on small numbers of longer time series, because the DTW computation is faster than other approaches, but the clustering problem becomes increasingly computationally expensive as the number of time series to be clustered increases.

Abstract

We present an approach for computationally efficient dynamic time warping (DTW) and clustering of time-series data. The method frames the dynamic warping of time series datasets as an optimisation problem solved using dynamic programming, and then clusters time series data by solving a second optimisation problem using mixed-integer programming (MIP). There is also an option to use k-medoids clustering for increased speed, when a certificate for global optimality is not essential. The improved efficiency of our approach is due to task-level parallelisation of the clustering alongside DTW. Our approach was tested using the UCR Time Series Archive, and was found to be, on average, 33% faster than the next fastest option when using the same clustering method. This increases to 64% faster when considering only larger datasets (with more than 1000 time series). The MIP clustering is most effective on small numbers of longer time series, because the DTW computation is faster than other approaches, but the clustering problem becomes increasingly computationally expensive as the number of time series to be clustered increases.

Fast dynamic time warping and clustering in C++

TL;DR

The MIP clustering is most effective on small numbers of longer time series, because the DTW computation is faster than other approaches, but the clustering problem becomes increasingly computationally expensive as the number of time series to be clustered increases.

Abstract

We present an approach for computationally efficient dynamic time warping (DTW) and clustering of time-series data. The method frames the dynamic warping of time series datasets as an optimisation problem solved using dynamic programming, and then clusters time series data by solving a second optimisation problem using mixed-integer programming (MIP). There is also an option to use k-medoids clustering for increased speed, when a certificate for global optimality is not essential. The improved efficiency of our approach is due to task-level parallelisation of the clustering alongside DTW. Our approach was tested using the UCR Time Series Archive, and was found to be, on average, 33% faster than the next fastest option when using the same clustering method. This increases to 64% faster when considering only larger datasets (with more than 1000 time series). The MIP clustering is most effective on small numbers of longer time series, because the DTW computation is faster than other approaches, but the clustering problem becomes increasingly computationally expensive as the number of time series to be clustered increases.
Paper Structure (7 sections, 4 equations, 5 figures, 1 table)

This paper contains 7 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Two time series with DTW pairwise alignment between each element, showing one-to-many mapping properties of DTW (left). Cost matrix $C$ for the two time series, showing the warping path and final DTW cost at $C_{13,12}$ (right).
  • Figure 2: The DTW costs of all the pairwise comparisons between time series in the dataset are combined to make a distance matrix $D$.
  • Figure 3: Example output from the clustering process, where an entry of 1 indicates that time series $j$ belongs to cluster with centroid $i$
  • Figure 4: DTW-C++ k-medoids clustering becomes increasingly faster compared to DTAIDistance as the number of time series increases.
  • Figure 5: Change in computational time of DTW-C++ using MIP DTW clustering compared to DTAIDistance as the number of time series in the datasets to be clustered increases and the length of time series in the datasets increases.