Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding
Sana Tonekaboni, Danny Eytan, Anna Goldenberg
TL;DR
The paper tackles unsupervised representation learning for non-stationary multivariate time series, with a focus on medical data where labels are scarce. It introduces Temporal Neighborhood Coding (TNC), which defines local temporal neighborhoods with stationary properties and trains an encoder via a debiased contrastive objective using Positive-Unlabeled weighting. Across simulated and real-world datasets (ECG and HAR), TNC achieves superior clusterability and classification performance approaching supervised methods, outperforming CPC and Triplet-Loss baselines. The approach is architecture-agnostic, scalable, and yields interpretable trajectory representations of latent state changes, with broad potential applications including anomaly detection and patient trajectory analysis.
Abstract
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.
