Table of Contents
Fetching ...

Line Space Clustering (LSC): Feature-Based Clustering using K-medians and Dynamic Time Warping for Versatility

Joanikij Chulev, Angela Mladenovska

TL;DR

This work addresses clustering high-dimensional and noisy data by transforming each data point into a line in a new line-space and clustering based on a combined distance that blends Dynamic Time Warping (DTW) shape similarity with Euclidean magnitude. The Line Space Clustering (LSC) framework integrates a Savitzky-Golay smoothing step and a median-based cluster center update, governed by the weighting parameter $\alpha$ to balance shape versus magnitude. Extensive experiments on synthetic and real-world datasets demonstrate that LSC can match or exceed traditional baselines, especially under noisy conditions, and highlight the importance of line-space representation and DTW for capturing intra-point patterns. The work provides practical insights into parameter choices and offers code for replication, with future work aimed at improving scalability and automating parameter selection. In short, LSC presents a flexible, robust approach for high-dimensional clustering by treating features as sequential patterns and leveraging elastic sequence alignment. $D(\mathbf{x}_i, \mathbf{x}_j) = \alpha D_{\text{DTW}}(\mathbf{L}_i, \mathbf{L}_j) + (1-\alpha) D_{\text{EUC}}(\mathbf{x}_i, \mathbf{x}_j)$ encapsulates the core distance computation that enables this versatility.

Abstract

Clustering high-dimensional data is a critical challenge in machine learning due to the curse of dimensionality and the presence of noise. Traditional clustering algorithms often fail to capture the intrinsic structures in such data. This paper explores a combination of clustering methods, which we called Line Space Clustering (LSC), a representation that transforms data points into lines in a newly defined feature space, enabling clustering based on the similarity of feature value patterns, essentially treating features as sequences. LSC employs a combined distance metric that uses Euclidean and Dynamic Time Warping (DTW) distances, weighted by a parameter α, allowing flexibility in emphasizing shape or magnitude similarities. We delve deeply into the mechanics of DTW and the Savitzky Golay filter, explaining their roles in the algorithm. Extensive experiments demonstrate the efficacy of LSC on synthetic and real-world datasets, showing that randomly experimenting with time-series optimized methods sometimes might surprisingly work on a complex dataset, particularly in noisy environments. Source code and experiments are available at: https://github.com/JoanikijChulev/LSC.

Line Space Clustering (LSC): Feature-Based Clustering using K-medians and Dynamic Time Warping for Versatility

TL;DR

This work addresses clustering high-dimensional and noisy data by transforming each data point into a line in a new line-space and clustering based on a combined distance that blends Dynamic Time Warping (DTW) shape similarity with Euclidean magnitude. The Line Space Clustering (LSC) framework integrates a Savitzky-Golay smoothing step and a median-based cluster center update, governed by the weighting parameter to balance shape versus magnitude. Extensive experiments on synthetic and real-world datasets demonstrate that LSC can match or exceed traditional baselines, especially under noisy conditions, and highlight the importance of line-space representation and DTW for capturing intra-point patterns. The work provides practical insights into parameter choices and offers code for replication, with future work aimed at improving scalability and automating parameter selection. In short, LSC presents a flexible, robust approach for high-dimensional clustering by treating features as sequential patterns and leveraging elastic sequence alignment. encapsulates the core distance computation that enables this versatility.

Abstract

Clustering high-dimensional data is a critical challenge in machine learning due to the curse of dimensionality and the presence of noise. Traditional clustering algorithms often fail to capture the intrinsic structures in such data. This paper explores a combination of clustering methods, which we called Line Space Clustering (LSC), a representation that transforms data points into lines in a newly defined feature space, enabling clustering based on the similarity of feature value patterns, essentially treating features as sequences. LSC employs a combined distance metric that uses Euclidean and Dynamic Time Warping (DTW) distances, weighted by a parameter α, allowing flexibility in emphasizing shape or magnitude similarities. We delve deeply into the mechanics of DTW and the Savitzky Golay filter, explaining their roles in the algorithm. Extensive experiments demonstrate the efficacy of LSC on synthetic and real-world datasets, showing that randomly experimenting with time-series optimized methods sometimes might surprisingly work on a complex dataset, particularly in noisy environments. Source code and experiments are available at: https://github.com/JoanikijChulev/LSC.

Paper Structure

This paper contains 57 sections, 10 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Visualization of data points in the line space. Each line represents a data point plotted against feature indices.
  • Figure 2: Effect of Savitzky-Golay smoothing on data lines.
  • Figure 3: Clustered line space after applying LSC. Different colors represent different clusters, k was chosen as 5.
  • Figure 4: Sub-figure 1 shows LSC clusters (in both 2D and Line Space), Sub-figure 2 shows K-means clusters, Sub-figure 3 shows Agglomerative clusters.
  • Figure 5: Execution Time Comparison on arbitrarily created Datasets with a high noise index. LSC demonstrates reasonable execution times considering its complexity. Demonstrating an increase in execution timings in a linear fashion with data complexity.