Table of Contents
Fetching ...

On time series clustering with k-means

Christopher Holder, Anthony Bagnall, Jason Lines

TL;DR

This work proposes a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria, creating a unified structure for comparing seven popular Lloyd's-based TSCL algorithms.

Abstract

There is a long history of research into time series clustering using distance-based partitional clustering. Many of the most popular algorithms adapt k-means (also known as Lloyd's algorithm) to exploit time dependencies in the data by specifying a time series distance function. However, these algorithms are often presented with k-means configured in various ways, altering key parameters such as the initialisation strategy. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. To address this, we propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria. By doing so, we create a unified structure for comparing seven popular Lloyd's-based TSCL algorithms. This common framework enables us to more easily attribute differences in clustering performance to the distance function itself, rather than variations in the k-means configuration.

On time series clustering with k-means

TL;DR

This work proposes a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria, creating a unified structure for comparing seven popular Lloyd's-based TSCL algorithms.

Abstract

There is a long history of research into time series clustering using distance-based partitional clustering. Many of the most popular algorithms adapt k-means (also known as Lloyd's algorithm) to exploit time dependencies in the data by specifying a time series distance function. However, these algorithms are often presented with k-means configured in various ways, altering key parameters such as the initialisation strategy. This variability makes it difficult to compare studies because k-means is known to be highly sensitive to its configuration. To address this, we propose a standard Lloyd's-based model for TSCL that adopts an end-to-end approach, incorporating a specialised distance function not only in the assignment step but also in the initialisation and stopping criteria. By doing so, we create a unified structure for comparing seven popular Lloyd's-based TSCL algorithms. This common framework enables us to more easily attribute differences in clustering performance to the distance function itself, rather than variations in the k-means configuration.

Paper Structure

This paper contains 13 sections, 2 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: A flow diagram of the $k$-means algorithm.
  • Figure 2: Examples of different clusterings of the GunPoint dataset. The top clusters (a) are grouped by class label while the bottom clusters (b) are grouped by the participant.
  • Figure 3: CD diagrams of different initialisation strategies for $k$-means over 112 datasets from the UCR archive using the combined test-train split. "random" refers to random initialisation, "random-restarts" refers to random initialisation with 10 restarts, where the restart with the lowest inertia is selected. "forgy" denotes Forgy initialisation, "forgy-restarts" represents Forgy initialisation with 10 restarts, and "g-kmeans++" denotes greedy $k$-means++.
  • Figure 4: CLACC violin plot for different initialisation strategies over the 112 of the UCR archive using the combined test-train split.
  • Figure 5: A violin plot to demonstrate the differences in run time for various initialisation strategies over the 112 UCR datasets with test-train split.
  • ...and 4 more figures