Table of Contents
Fetching ...

Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding

Sana Tonekaboni, Danny Eytan, Anna Goldenberg

TL;DR

The paper tackles unsupervised representation learning for non-stationary multivariate time series, with a focus on medical data where labels are scarce. It introduces Temporal Neighborhood Coding (TNC), which defines local temporal neighborhoods with stationary properties and trains an encoder via a debiased contrastive objective using Positive-Unlabeled weighting. Across simulated and real-world datasets (ECG and HAR), TNC achieves superior clusterability and classification performance approaching supervised methods, outperforming CPC and Triplet-Loss baselines. The approach is architecture-agnostic, scalable, and yields interpretable trajectory representations of latent state changes, with broad potential applications including anomaly detection and patient trajectory analysis.

Abstract

Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.

Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding

TL;DR

The paper tackles unsupervised representation learning for non-stationary multivariate time series, with a focus on medical data where labels are scarce. It introduces Temporal Neighborhood Coding (TNC), which defines local temporal neighborhoods with stationary properties and trains an encoder via a debiased contrastive objective using Positive-Unlabeled weighting. Across simulated and real-world datasets (ECG and HAR), TNC achieves superior clusterability and classification performance approaching supervised methods, outperforming CPC and Triplet-Loss baselines. The approach is architecture-agnostic, scalable, and yields interpretable trajectory representations of latent state changes, with broad potential applications including anomaly detection and patient trajectory analysis.

Abstract

Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.

Paper Structure

This paper contains 27 sections, 4 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Overview of the TNC framework components. For each sample window $W_t$ (indicated with the dashed black box), we first define the neighborhood distribution. The encoder learns the distribution of windows sampled from $N_t$ and $\bar{N_t}$, in the representation space. Then samples from these distributions are fed into the discriminator alongside $Z_t$, to predict the probability of the windows being in the same neighborhood.
  • Figure 4: T-SNE visualization of signal representations for the simulated dataset across all baselines. Each data point in the plot presents a 10-dimensional representation of a window of time series of size $\delta=50$, and the color indicates the latent state of the signal window. See Appendix \ref{['app:plots']} for similar plots from different datasets.
  • Figure 5: Trajectory of a signal encoding from the simulated dataset. The top plot shows the original time series with shaded regions indicating the underlying state. The bottom plot shows the 10 dimensional encoding of the sliding windows $W_t$ where $\delta=50$.
  • Figure A.1: A normalized time series sample from the simulated dataset. Each row represents a single feature, and the shaded regions indicate one of the $4$ underllying simulated states.
  • Figure A.2: T-SNE visualization of waveform signal representations for unsupervised representation learning baselines. Each point in the plot is a 64 dimensional representation of a window of time series, with the color indicating the latent state.
  • ...and 4 more figures