A Shapelet-based Framework for Unsupervised Multivariate Time Series Representation Learning
Zhiyu Liang, Jianfeng Zhang, Chen Liang, Hongzhi Wang, Zheng Liang, Lujia Pan
TL;DR
This work tackles unsupervised, general-purpose representation learning for multivariate time series by introducing Contrastive Shapelet Learning (CSL). CSL deploys a Shapelet Transformer to encode time series across multiple scales and measures, paired with a multi-grained contrastive objective and a multi-scale alignment loss facilitated by a diverse data augmentation library. Empirical results across 34 real-world datasets show CSL consistently outperforms competing URL methods and competes with fully supervised approaches on several tasks, while also offering interpretable shapelets. The framework is scalable to long time series and provides practical, implementable benefits for downstream tasks such as classification, clustering, and anomaly detection.
Abstract
Recent studies have shown great promise in unsupervised representation learning (URL) for multivariate time series, because URL has the capability in learning generalizable representation for many downstream tasks without using inaccessible labels. However, existing approaches usually adopt the models originally designed for other domains (e.g., computer vision) to encode the time series data and {rely on strong assumptions to design learning objectives, which limits their ability to perform well}. To deal with these problems, we propose a novel URL framework for multivariate time series by learning time-series-specific shapelet-based representation through a popular contrasting learning paradigm. To the best of our knowledge, this is the first work that explores the shapelet-based embedding in the unsupervised general-purpose representation learning. A unified shapelet-based encoder and a novel learning objective with multi-grained contrasting and multi-scale alignment are particularly designed to achieve our goal, and a data augmentation library is employed to improve the generalization. We conduct extensive experiments using tens of real-world datasets to assess the representation quality on many downstream tasks, including classification, clustering, and anomaly detection. The results demonstrate the superiority of our method against not only URL competitors, but also techniques specially designed for downstream tasks. Our code has been made publicly available at https://github.com/real2fish/CSL.
