Table of Contents
Fetching ...

STAMP: Spatial-Temporal Adapter with Multi-Head Pooling

Brad Shook, Abby Turner, Jieshi Chen, Michał Wiliński, Mononito Goswami, Jonathan Elmer, Artur Dubrawski

TL;DR

STAMP is a lightweight Spatial-Temporal Adapter that sits on frozen TSFMs to tackle EEG classification tasks by explicitly encoding spatial and temporal structure. It combines positional encodings, a criss-cross gated MMLP, and multi-head attention pooling to convert univariate TSFM embeddings into strong discriminative features, achieving performance on eight EEG benchmarks that rivals EEG foundation models while using only about $7.5\times 10^5$ trainable parameters. Ablation studies show that positional encoding and token mixing are essential, and the approach proves versatile across different TSFMs, underlining the practicality of reusing large pre-trained time-series models for EEG without extensive fine-tuning. These findings highlight STAMP's potential to extend to other multivariate time-series domains, offering a scalable, interpretable path to leveraging general-purpose foundation models in specialized biomedical tasks.

Abstract

Time series foundation models (TSFMs) pretrained on data from multiple domains have shown strong performance on diverse modeling tasks. Various efforts have been made to develop foundation models specific to electroencephalography (EEG) data, which records brain electrical activity as time series. However, no comparative analysis of EEG-specific foundation models (EEGFMs) versus general TSFMs has been performed on EEG-specific tasks. We introduce a novel Spatial-Temporal Adapter with Multi-Head Pooling (STAMP), which leverages univariate embeddings produced by a general TSFM, implicitly models spatial-temporal characteristics of EEG data, and achieves performance comparable to state-of-the-art EEGFMs. A comprehensive analysis is performed on 8 benchmark datasets of clinical tasks using EEG for classification, along with ablation studies. Our proposed adapter is lightweight in trainable parameters and flexible in the inputs it can accommodate, supporting easy modeling of EEG data using TSFMs.

STAMP: Spatial-Temporal Adapter with Multi-Head Pooling

TL;DR

STAMP is a lightweight Spatial-Temporal Adapter that sits on frozen TSFMs to tackle EEG classification tasks by explicitly encoding spatial and temporal structure. It combines positional encodings, a criss-cross gated MMLP, and multi-head attention pooling to convert univariate TSFM embeddings into strong discriminative features, achieving performance on eight EEG benchmarks that rivals EEG foundation models while using only about trainable parameters. Ablation studies show that positional encoding and token mixing are essential, and the approach proves versatile across different TSFMs, underlining the practicality of reusing large pre-trained time-series models for EEG without extensive fine-tuning. These findings highlight STAMP's potential to extend to other multivariate time-series domains, offering a scalable, interpretable path to leveraging general-purpose foundation models in specialized biomedical tasks.

Abstract

Time series foundation models (TSFMs) pretrained on data from multiple domains have shown strong performance on diverse modeling tasks. Various efforts have been made to develop foundation models specific to electroencephalography (EEG) data, which records brain electrical activity as time series. However, no comparative analysis of EEG-specific foundation models (EEGFMs) versus general TSFMs has been performed on EEG-specific tasks. We introduce a novel Spatial-Temporal Adapter with Multi-Head Pooling (STAMP), which leverages univariate embeddings produced by a general TSFM, implicitly models spatial-temporal characteristics of EEG data, and achieves performance comparable to state-of-the-art EEGFMs. A comprehensive analysis is performed on 8 benchmark datasets of clinical tasks using EEG for classification, along with ablation studies. Our proposed adapter is lightweight in trainable parameters and flexible in the inputs it can accommodate, supporting easy modeling of EEG data using TSFMs.

Paper Structure

This paper contains 31 sections, 3 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: A diagram showing how EEG data is processed by MOMENT and STAMP. The EEG data is separated into tokens, which are embedded using MOMENT before positional encoding is applied. The resulting tokens are passed through the CC-GMLP, where spatial and temporal relationships are incorporated into embeddings. MHAP then determines relevant features and generates final predictions by projecting embeddings into lower dimensional spaces.
  • Figure 2: Performance comparison between four positional encoding options: No PE (0.71M), PE-N (0.73M), PE-ST (0.72M), and PE-NST (0.74M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets.
  • Figure 3: Performance comparison between four different token mixer options: B-GMLP (0.79M), CC-GMLP (0.74M), B-TF (1.25M), and CC-TF (0.99M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets.
  • Figure 4: Performance comparison between token aggregation strategies: mean pooling (0.70M) and MHAP (0.74M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets.
  • Figure 5: Performance comparison between the full evaluation of 5 methods: STAMP (0.74M), CBraMod (29M), LaBraM (5.8M), ST-Transformer (3.5M), and EEG Conformer (0.55M). The value in parentheses indicates the average number of trainable parameters across the 4 datasets.
  • ...and 11 more figures