Table of Contents
Fetching ...

Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data

Duy Nguyen, Jiachen Yao, Jiayun Wang, Julius Berner, Animashree Anandkumar

TL;DR

FGNO tackles the challenge of learning flexible, multi-scale representations from unlabeled time-series data under varying sampling rates and data scarcity. It combines flow matching with neural operators by embedding signals into STFT spectrograms and learning a time-conditioned backbone, where representation granularity is controlled via the layer index $l$ and flow time $s$. After pretraining, FGNO exploits clean inputs for downstream probing, achieving strong gains over MAE/contrastive SSL and even a time-series foundation model across biomedical tasks, including up to $35\%$ AUROC improvements and robust performance with only 5\% labeled data. The approach yields resolution-invariant, high-signal representations that generalize across resolutions and domains, enabling data-efficient, accurate downstream discrimination and prediction in healthcare time-series applications.

Abstract

Self-supervised learning (SSL) is a powerful paradigm for learning from unlabeled time-series data. However, popular methods such as masked autoencoders (MAEs) rely on reconstructing inputs from a fixed, predetermined masking ratio. Instead of this static design, we propose treating the corruption level as a new degree of freedom for representation learning, enhancing flexibility and performance. To achieve this, we introduce the Flow-Guided Neural Operator (FGNO), a novel framework combining operator learning with flow matching for SSL training. FGNO learns mappings in functional spaces by using Short-Time Fourier Transform to unify different time resolutions. We extract a rich hierarchy of features by tapping into different network layers and flow times that apply varying strengths of noise to the input data. This enables the extraction of versatile representations, from low-level patterns to high-level global features, using a single model adaptable to specific tasks. Unlike prior generative SSL methods that use noisy inputs during inference, we propose using clean inputs for representation extraction while learning representations with noise; this eliminates randomness and boosts accuracy. We evaluate FGNO across three biomedical domains, where it consistently outperforms established baselines. Our method yields up to 35% AUROC gains in neural signal decoding (BrainTreeBank), 16% RMSE reductions in skin temperature prediction (DREAMT), and over 20% improvement in accuracy and macro-F1 on SleepEDF under low-data regimes. These results highlight FGNO's robustness to data scarcity and its superior capacity to learn expressive representations for diverse time series.

Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data

TL;DR

FGNO tackles the challenge of learning flexible, multi-scale representations from unlabeled time-series data under varying sampling rates and data scarcity. It combines flow matching with neural operators by embedding signals into STFT spectrograms and learning a time-conditioned backbone, where representation granularity is controlled via the layer index and flow time . After pretraining, FGNO exploits clean inputs for downstream probing, achieving strong gains over MAE/contrastive SSL and even a time-series foundation model across biomedical tasks, including up to AUROC improvements and robust performance with only 5\% labeled data. The approach yields resolution-invariant, high-signal representations that generalize across resolutions and domains, enabling data-efficient, accurate downstream discrimination and prediction in healthcare time-series applications.

Abstract

Self-supervised learning (SSL) is a powerful paradigm for learning from unlabeled time-series data. However, popular methods such as masked autoencoders (MAEs) rely on reconstructing inputs from a fixed, predetermined masking ratio. Instead of this static design, we propose treating the corruption level as a new degree of freedom for representation learning, enhancing flexibility and performance. To achieve this, we introduce the Flow-Guided Neural Operator (FGNO), a novel framework combining operator learning with flow matching for SSL training. FGNO learns mappings in functional spaces by using Short-Time Fourier Transform to unify different time resolutions. We extract a rich hierarchy of features by tapping into different network layers and flow times that apply varying strengths of noise to the input data. This enables the extraction of versatile representations, from low-level patterns to high-level global features, using a single model adaptable to specific tasks. Unlike prior generative SSL methods that use noisy inputs during inference, we propose using clean inputs for representation extraction while learning representations with noise; this eliminates randomness and boosts accuracy. We evaluate FGNO across three biomedical domains, where it consistently outperforms established baselines. Our method yields up to 35% AUROC gains in neural signal decoding (BrainTreeBank), 16% RMSE reductions in skin temperature prediction (DREAMT), and over 20% improvement in accuracy and macro-F1 on SleepEDF under low-data regimes. These results highlight FGNO's robustness to data scarcity and its superior capacity to learn expressive representations for diverse time series.
Paper Structure (33 sections, 8 equations, 6 figures, 7 tables)

This paper contains 33 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: (a) A common self-supervised learning (SSL) baseline is Masked Autoencoder (MAE, he2021maskedautoencodersscalablevision), where the input data is randomly masked at a fixed ratio and then fed to encoders and decoders to reconstruct the clean data. MAE learns useful representations by inpainting the missing part. (b) We ask if the ratio can vary continuously and propose flow-guided neural operator (FGNO), which is based on flow matching that progressively transforms noisy inputs, corrupted at flow time $s$, to clean data by predicting intermediate velocities. Both methods first transform the time series data to spectrograms via STFT (short-time Fourier transform) to extract local time-frequency features. (c) FGNO is pre-trained in a self-supervised manner using the flow-matching objective. FGNO learns in function space and empirically shows improved performance across different sampling rates of the input data. The decoder shown is a shallow spectrogram reconstruction head used solely during pretraining and discarded for downstream tasks. (d) After SSL pretraining, representations are probed by training a small classifier for downstream tasks. Compared with existing generative SSL methods, we use clean input data instead of noisy data as input and achieve similar performance with no randomness from the noise generation. (e) FGNO's performance on sleep/wake classification. A single FGNO model has improved flexibility with various layer and flow time $(l,s)$ combinations.
  • Figure 2: FGNO's performance across different layers and flow times on the DREAMT dataset. Left: Sleep classification AUROC ($\uparrow$). Right: Skin temperature regression RMSE ($\downarrow$). A darker color indicates better performance.
  • Figure 3: Model comparison in terms of size and accuracy on all four tasks of BrainTreeBank. The DeepNN's performance is quoted from chau2024populationtransformerlearningpopulationlevel. FGNO outperforms latest baselines in most tasks while being significantly smaller in size.
  • Figure 4: Sleep classification performance (DREAMT dataset, AUROC, %) comparing a "Noisy Input" against a "Clean Input" method across different model layers. (Left) For a representative layer, we plot performance as a function of time $s\in[0,1]$. While both the "Clean Input" and "Noisy Input" methods exhibit the same behavioral trend, the clean input approach yields consistently higher performance. (Right) At the optimal time $s\approx0.89$, the noisy method exhibits high variance over 10 runs (red-shaded region), while the clean method is deterministic and stable.
  • Figure 5: On BrainTreeBank speech classification, our FGNO model, pre-trained once on the original high-resolution data, is evaluated against MAE and Chronos baselines on inputs downsampled by various factors. FGNO consistently outperforms both baselines and shows remarkable stability across resolutions, demonstrating the benefit of learning a resolution-agnostic mapping in function space.
  • ...and 1 more figures