Table of Contents
Fetching ...

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering

Christopher Holder, Anthony Bagnall

TL;DR

KASBA is a k-means clustering algorithm that uses the Move-Split-Merge elastic distance at all stages of clustering, applies a randomised stochastic subgradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations.

Abstract

Time series data has become increasingly prevalent across numerous domains, driving a growing demand for time series machine learning techniques. Among these, time series clustering (TSCL) stands out as one of the most popular machine learning tasks. TSCL serves as a powerful exploratory analysis tool and is also employed as a preprocessing step or subroutine for various tasks, including anomaly detection, segmentation, and classification. The most popular TSCL algorithms are either fast (in terms of run time) but perform poorly on benchmark problems, or perform well on benchmarks but scale poorly. We present a new TSCL algorithm, the $k$-means (K) accelerated (A) Stochastic subgradient (S) Barycentre (B) Average (A) (KASBA) clustering algorithm. KASBA is a $k$-means clustering algorithm that uses the Move-Split-Merge (MSM) elastic distance at all stages of clustering, applies a randomised stochastic subgradient gradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations. It is a versatile and scalable clusterer designed for real-world TSCL applications. It allows practitioners to balance run time and clustering performance. We demonstrate through extensive experimentation that KASBA produces significantly better clustering than the faster state of the art clusterers and is offers orders of magnitude improvement in run time over the most performant $k$-means alternatives.

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering

TL;DR

KASBA is a k-means clustering algorithm that uses the Move-Split-Merge elastic distance at all stages of clustering, applies a randomised stochastic subgradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations.

Abstract

Time series data has become increasingly prevalent across numerous domains, driving a growing demand for time series machine learning techniques. Among these, time series clustering (TSCL) stands out as one of the most popular machine learning tasks. TSCL serves as a powerful exploratory analysis tool and is also employed as a preprocessing step or subroutine for various tasks, including anomaly detection, segmentation, and classification. The most popular TSCL algorithms are either fast (in terms of run time) but perform poorly on benchmark problems, or perform well on benchmarks but scale poorly. We present a new TSCL algorithm, the -means (K) accelerated (A) Stochastic subgradient (S) Barycentre (B) Average (A) (KASBA) clustering algorithm. KASBA is a -means clustering algorithm that uses the Move-Split-Merge (MSM) elastic distance at all stages of clustering, applies a randomised stochastic subgradient gradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations. It is a versatile and scalable clusterer designed for real-world TSCL applications. It allows practitioners to balance run time and clustering performance. We demonstrate through extensive experimentation that KASBA produces significantly better clustering than the faster state of the art clusterers and is offers orders of magnitude improvement in run time over the most performant -means alternatives.

Paper Structure

This paper contains 32 sections, 11 equations, 15 figures, 5 tables, 7 algorithms.

Figures (15)

  • Figure 1: Example of alignment between two time series when using the Euclidean distance and DTW distance. The dashed gray lines represents which points in the red time series are compared to in the blue time series.
  • Figure 2: Optimal MSM warping path through $CM_{msm}$ and a visualisation of MSM alignment between the two time series.
  • Figure 3: KASBA against seven benchmark algorithms on test data, averaged over the 98 UCR archive data completed by all algorithms using the default train-test split.
  • Figure 4: KASBA against seven benchmark algorithms on test data, averaged over the 109 UCR archive data completed by all algorithms using the default train-test split.
  • Figure 5: Summary performance measures for eight clustering algorithms for clustering accuracy, including mean difference (top), wins/ties/losses (middle) and p-value for a one sided Wilcoxon sign ranked test (unadjusted for multiple testing, bottom).
  • ...and 10 more figures