Structure-Preserving Transformers for Sequences of SPD Matrices

Mathieu Seraphim; Alexis Lechervy; Florian Yger; Luc Brun; Olivier Etard

Structure-Preserving Transformers for Sequences of SPD Matrices

Mathieu Seraphim, Alexis Lechervy, Florian Yger, Luc Brun, Olivier Etard

TL;DR

This work tackles learning from sequences of Symmetric Positive Definite matrices by preserving their Riemannian geometry throughout processing. It introduces SP-MHA, a structure-preserving multihead attention mechanism based on LogEuclidean mappings and triangular linear maps, integrated into SPDTransNet for EEG sleep staging. The approach preserves SPD structure across all Transformer components and achieves state-of-the-art macro-F1 and N1-F1 scores on the MASS SS3 dataset, outperforming several baselines and ablations confirm the benefit of structure preservation. The method offers a principled, geometry-aware framework for SPD-valued data with potential applicability beyond sleep staging to other domains requiring manifold-consistent sequence modeling.

Abstract

In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.

Structure-Preserving Transformers for Sequences of SPD Matrices

TL;DR

Abstract

Paper Structure (10 sections, 4 equations, 2 figures, 1 table)

This paper contains 10 sections, 4 equations, 2 figures, 1 table.

Introduction
SPD Structure-Preserving Attention
Structure-Preserving Multihead Attention (SP-MHA)
Triangular linear maps
Application to EEG Sleep Staging
The stakes of automatic sleep staging
Our preprocessing
The SPDTransNet model
Experiments & Results
Conclusion

Figures (2)

Figure 1: The SP-MHA architecture. In parentheses are tensor dimensions at every step, with $N$ the batch size.
Figure 2: SPDTransNet global architecture, with $t=3$ feature tokens per epoch.

Structure-Preserving Transformers for Sequences of SPD Matrices

TL;DR

Abstract

Structure-Preserving Transformers for Sequences of SPD Matrices

Authors

TL;DR

Abstract

Table of Contents

Figures (2)