On the clustering behavior of sliding windows
Boris Alexeev, Wenyan Luo, Dustin G. Mixon, Yan X Zhang
TL;DR
This paper analyzes the pitfalls of clustering sliding windows of time series under $k$-means and spectral clustering, showing that the window length relative to the series length drives qualitatively different failure modes. It provides theoretical bounds and constructions explaining why small windows yield flat centroids, why near-symmetric window arrangements produce sinusoidal centroids, and why large windows tend to produce interval-based clusters. The results combine probabilistic and spectral-analysis techniques, including PCA, Wedin's sinΘ theorem, and Grassmannian distance concepts, and are illustrated with real and synthetic data. Collectively, the findings inform how to interpret cluster structure in sliding-window representations and suggest cautions for choosing $w$ and $k$ in time-series clustering tasks.
Abstract
Things can go spectacularly wrong when clustering timeseries data that has been preprocessed with a sliding window. We highlight three surprising failures that emerge depending on how the window size compares with the timeseries length. In addition to computational examples, we present theoretical explanations for each of these failure modes.
