Table of Contents
Fetching ...

Learning Time-Scale Invariant Population-Level Neural Representations

Eshani Patel, Yisong Yue, Geeling Chau

TL;DR

The paper addresses the challenge of generalizing population-level neural representations across varying input time-scales in iEEG data. It introduces Time-scale Augmented Pretraining (TSAP) on top of the Population Transformer (PopT), using a BrainBERT temporal encoder, and exposes the model to multiple interval lengths during pretraining. Across Word Onset and Sentence Onset tasks on BrainTreeBank data, TSAP closes the performance gap caused by time-scale mismatches and often exceeds optimally matched baselines, even for unseen scales such as held-out 3-second intervals. Embedding analyses show that TSAP reduces time-scale clustering in the representation space, supporting more invariant, transferable population-level neural representations for neuroscience and brain–computer interface applications.

Abstract

General-purpose foundation models for neural time series can help accelerate neuroscientific discoveries and enable applications such as brain computer interfaces (BCIs). A key component in scaling these models is population-level representation learning, which leverages information across channels to capture spatial as well as temporal structure. Population-level approaches have recently shown that such representations can be both efficient to learn on top of pretrained temporal encoders and produce useful representations for decoding a variety of downstream tasks. However, these models remain sensitive to mismatches in preprocessing, particularly on time-scales, between pretraining and downstream settings. We systematically examine how time-scale mismatches affects generalization and find that existing representations lack invariance. To address this, we introduce Time-scale Augmented Pretraining (TSAP), which consistently improves robustness to different time-scales across decoding tasks and builds invariance in the representation space. These results highlight handling preprocessing diversity as a key step toward building generalizable neural foundation models.

Learning Time-Scale Invariant Population-Level Neural Representations

TL;DR

The paper addresses the challenge of generalizing population-level neural representations across varying input time-scales in iEEG data. It introduces Time-scale Augmented Pretraining (TSAP) on top of the Population Transformer (PopT), using a BrainBERT temporal encoder, and exposes the model to multiple interval lengths during pretraining. Across Word Onset and Sentence Onset tasks on BrainTreeBank data, TSAP closes the performance gap caused by time-scale mismatches and often exceeds optimally matched baselines, even for unseen scales such as held-out 3-second intervals. Embedding analyses show that TSAP reduces time-scale clustering in the representation space, supporting more invariant, transferable population-level neural representations for neuroscience and brain–computer interface applications.

Abstract

General-purpose foundation models for neural time series can help accelerate neuroscientific discoveries and enable applications such as brain computer interfaces (BCIs). A key component in scaling these models is population-level representation learning, which leverages information across channels to capture spatial as well as temporal structure. Population-level approaches have recently shown that such representations can be both efficient to learn on top of pretrained temporal encoders and produce useful representations for decoding a variety of downstream tasks. However, these models remain sensitive to mismatches in preprocessing, particularly on time-scales, between pretraining and downstream settings. We systematically examine how time-scale mismatches affects generalization and find that existing representations lack invariance. To address this, we introduce Time-scale Augmented Pretraining (TSAP), which consistently improves robustness to different time-scales across decoding tasks and builds invariance in the representation space. These results highlight handling preprocessing diversity as a key step toward building generalizable neural foundation models.

Paper Structure

This paper contains 11 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Schematic of our approach. (a) Previous population-level transformer (PopT) chau2025population pretrain on fixed temporal windows, which opens up the question of how sensitive these models are to inputs of different time-scales. (b) Our approach seeks to provide optimal performance for any input length. (c) Time-scale Augmented Pretraining (TSAP) use samples from varying input interval lengths to achieve invariance to input time-scales.
  • Figure 2: Performance drop from mismatch in input time-scales is recovered by TSAP. (a) Compared to the optimal (dotted line), models (x-axis) trained with mismatched time-scales perform much worse (below the line), while TSAP (dark blue) generally improves upon the optimal baseline. Shown are the Word Onset ROC AUC difference means and standard error across subjects and 5 seeds. (b) We see TSAP (dark blue) match or outperform all other models across all input lengths. Shown are the Word Onset ROC AUC mean and standard error across subjects and 5 seeds.
  • Figure 3: PCA Analysis of Raw Embeddings and CLS tokens. (a) PCA projection of temporal embeddings taken from different time-scales (colors) from 1 subject and 1 session from the Word Onset task. Temporal Embeddings tend to cluster by interval length despite them being from the same 100 samples. (b) PCA of CLS token after training with 5-second intervals only. We again see strong clustering by time-scale, with K-Means clusters identified for each ("X" marks). (c) PCA of CLS token after training with TSAP. We see that TSAP CLS tokens across several time-scales are clustered more closely together, with confused K-Means cluster ("X" marks).
  • Figure 4: Performance drop from mismatch in input time-scales is recovered by TSAP. (a) Compared to the optimal (dotted line), models (x-axis) trained with mismatched time-scales perform much worse (below the line), while TSAP (dark blue) generally improves upon the optimal baseline. Shown are the Sentence Onset relative ROC AUC difference means and standard error across subjects and 5 seeds. (b) We see TSAP (dark blue) closely matches or outperform other models across all input time-scales. Shown are the Sentence Onset ROC AUC mean and standard error across subjects and 5 seeds.
  • Figure 5: Confusion matrix following K-Means clustering of CLS tokens. For (a) 5s Pretrained PopT and (b) TSAP model. We see clean clustering in the 5s, but much more confusion in the TSAP version.