Table of Contents
Fetching ...

Self-Supervised Dynamical System Representations for Physiological Time-Series

Yenho Chen, Maxwell A. Xu, James M. Rehg, Christopher J. Rozell

TL;DR

Physiological time-series SSL often conflates underlying system dynamics with sample-specific noise. The authors propose PULSE, a cross-reconstruction objective grounded in a dynamical-systems generative model to recover shared system parameters while discarding sample-specific factors, with theoretical conditions for successful recovery. Synthetic experiments on chaotic systems validate the approach, and real-world datasets demonstrate improved linear probe performance, label efficiency, and transferability across domains. By explicitly prioritizing system information, PULSE achieves robust representations that generalize better to downstream tasks than conventional SSL methods.

Abstract

The effectiveness of self-supervised learning (SSL) for physiological time series depends on the ability of a pretraining objective to preserve information about the underlying physiological state while filtering out unrelated noise. However, existing strategies are limited due to reliance on heuristic principles or poorly constrained generative tasks. To address this limitation, we propose a pretraining framework that exploits the information structure of a dynamical systems generative model across multiple time-series. This framework reveals our key insight that class identity can be efficiently captured by extracting information about the generative variables related to the system parameters shared across similar time series samples, while noise unique to individual samples should be discarded. Building on this insight, we propose PULSE, a cross-reconstruction-based pretraining objective for physiological time series datasets that explicitly extracts system information while discarding non-transferrable sample-specific ones. We establish theory that provides sufficient conditions for the system information to be recovered, and empirically validate it using a synthetic dynamical systems experiment. Furthermore, we apply our method to diverse real-world datasets, demonstrating that PULSE learns representations that can broadly distinguish semantic classes, increase label efficiency, and improve transfer learning.

Self-Supervised Dynamical System Representations for Physiological Time-Series

TL;DR

Physiological time-series SSL often conflates underlying system dynamics with sample-specific noise. The authors propose PULSE, a cross-reconstruction objective grounded in a dynamical-systems generative model to recover shared system parameters while discarding sample-specific factors, with theoretical conditions for successful recovery. Synthetic experiments on chaotic systems validate the approach, and real-world datasets demonstrate improved linear probe performance, label efficiency, and transferability across domains. By explicitly prioritizing system information, PULSE achieves robust representations that generalize better to downstream tasks than conventional SSL methods.

Abstract

The effectiveness of self-supervised learning (SSL) for physiological time series depends on the ability of a pretraining objective to preserve information about the underlying physiological state while filtering out unrelated noise. However, existing strategies are limited due to reliance on heuristic principles or poorly constrained generative tasks. To address this limitation, we propose a pretraining framework that exploits the information structure of a dynamical systems generative model across multiple time-series. This framework reveals our key insight that class identity can be efficiently captured by extracting information about the generative variables related to the system parameters shared across similar time series samples, while noise unique to individual samples should be discarded. Building on this insight, we propose PULSE, a cross-reconstruction-based pretraining objective for physiological time series datasets that explicitly extracts system information while discarding non-transferrable sample-specific ones. We establish theory that provides sufficient conditions for the system information to be recovered, and empirically validate it using a synthetic dynamical systems experiment. Furthermore, we apply our method to diverse real-world datasets, demonstrating that PULSE learns representations that can broadly distinguish semantic classes, increase label efficiency, and improve transfer learning.

Paper Structure

This paper contains 25 sections, 2 theorems, 10 equations, 5 figures, 7 tables.

Key Result

Theorem 1

Given two time series ${\bf Y}_i$ and ${\bf Y}_j$ independently sampled from the same system (i.e., ${\bf \Theta}_i = {\bf \Theta}_j={\bf \Theta}^{(s)}$) under the generative process defined by Eq. eq:dataset-level-generative-process and Assumption assumption:data-generate, the minimal set of latent

Figures (5)

  • Figure 1: Intuition behind PULSE. A dynamical systems model of a physiological time-series dataset allows us to distinguish between information that is shared between similar time series and information that is sample-specific and not transferrable. PULSE leverages this distinction to learn representations that preserve shared system information while discarding sample-specific ones.
  • Figure 2: Our graphical model of multiple time-series windows, based on dynamical systems, distinguishes transferable system information shared across similar time-series from non-transferable information unique to each sample such as initial conditions and process noise.
  • Figure 3: PULSE aims to recover system information through an inference process that uses two encoders, $f_{\rm sys}$ to estimate shared parameters of a latent dynamical systems and $f_{\rm init}$ to estimate sample-specific initial conditions. By requiring ${\bf \Theta}_i$ to support reconstruction of randomly sampled ${\bf X}_{i, t_0}$, we encourage the recovered system information to be invariant to the sample-specific ones.
  • Figure 4: We illustrate how these different masking strategies recover different sources of information in our data-generating process for sample pairs $({\bf Y}_i, {\bf Y}_j)$ where $i,j \in \mathcal{I}_s$. ${\bf Y}$ marks an observable that is removed from the input and used as a reconstruction target. Blue highlights $\mathcal{C}$, representing the information that is recovered during pretraining. Theorem \ref{['theorem:locating_system_info']} predicts that $\mathcal{C}=\{{\bf \Theta}^{(s)}\}$ only when information from one sample is fully removed. Gray boxes group latent variables that are specific to each time-series sample.
  • Figure 5: t-SNE visualizations of PULSE representations.

Theorems & Definitions (4)

  • Definition 3.1: Similar Time-Series
  • Theorem 1
  • Theorem 1
  • proof