RandomNet: Clustering Time Series Using Untrained Deep Neural Networks
Xiaosheng Li, Wenjie Xi, Jessica Lin
TL;DR
This work addresses time series clustering by introducing RandomNet, which leverages untrained deep networks with random weights to produce diverse representations without backpropagation. By ensembling clustering results from many random representations through a Hybrid Bipartite Graph Formulation, RandomNet achieves scalable, linear-time performance while providing theoretical guarantees on ensemble effectiveness, notably a lower bound $b\ge -2\ln \alpha/\gamma^2$ that is independent of dataset size. Extensive experiments on all 128 UCR datasets show RandomNet attaining competitive or superior Rand Index scores compared to state-of-the-art baselines, with ablation studies and robustness analyses supporting the design choices. The method’s training-free nature and strong scalability make it appealing for large-scale, diverse time series clustering, with potential extensions to multivariate data and domain-specific architectures.
Abstract
Neural networks are widely used in machine learning and data mining. Typically, these networks need to be trained, implying the adjustment of weights (parameters) within the network based on the input data. In this work, we propose a novel approach, RandomNet, that employs untrained deep neural networks to cluster time series. RandomNet uses different sets of random weights to extract diverse representations of time series and then ensembles the clustering relationships derived from these different representations to build the final clustering results. By extracting diverse representations, our model can effectively handle time series with different characteristics. Since all parameters are randomly generated, no training is required during the process. We provide a theoretical analysis of the effectiveness of the method. To validate its performance, we conduct extensive experiments on all of the 128 datasets in the well-known UCR time series archive and perform statistical analysis of the results. These datasets have different sizes, sequence lengths, and they are from diverse fields. The experimental results show that the proposed method is competitive compared with existing state-of-the-art methods.
