Table of Contents
Fetching ...

PSDNorm: Test-Time Temporal Normalization for Deep Learning in Sleep Staging

Théo Gnassounou, Antoine Collas, Rémi Flamary, Alexandre Gramfort

TL;DR

PSDNorm tackles distribution shifts in sleep staging caused by subject- and device-level variability by introducing a test-time, PSD-based normalization layer that leverages temporal correlations. It combines PSD estimation via Welch, a running Riemannian barycenter on the PSDs, and an $F$-Monge mapping to align intermediate representations to a geodesic barycenter, effectively whitening and recoloring feature maps in the frequency domain. The method generalizes InstanceNorm (recoverable at $F=1$ with identity recoloring) and acts as a drop-in layer that performs test-time domain adaptation without re-training. Large-scale experiments across 10 sleep datasets with about $10^4$ subjects demonstrate state-of-the-art performance and improved data efficiency, with PSDNorm ranking top in most settings and showing robustness across architectures like U-Sleep and CNNTransformer. PSDNorm thus offers a practical, data-efficient solution for domain shift in physiological signals and holds promise for broader biomedical applications beyond sleep staging.

Abstract

Distribution shift poses a significant challenge in machine learning, particularly in biomedical applications using data collected across different subjects, institutions, and recording devices, such as sleep data. While existing normalization layers, BatchNorm, LayerNorm and InstanceNorm, help mitigate distribution shifts, when applied over the time dimension they ignore the dependencies and auto-correlation inherent to the vector coefficients they normalize. In this paper, we propose PSDNorm that leverages Monge mapping and temporal context to normalize feature maps in deep learning models for signals. Notably, the proposed method operates as a test-time domain adaptation technique, addressing distribution shifts without additional training. Evaluations with architectures based on U-Net or transformer backbones trained on 10K subjects across 10 datasets, show that PSDNorm achieves state-of-the-art performance on unseen left-out datasets while being 4-times more data-efficient than BatchNorm.

PSDNorm: Test-Time Temporal Normalization for Deep Learning in Sleep Staging

TL;DR

PSDNorm tackles distribution shifts in sleep staging caused by subject- and device-level variability by introducing a test-time, PSD-based normalization layer that leverages temporal correlations. It combines PSD estimation via Welch, a running Riemannian barycenter on the PSDs, and an -Monge mapping to align intermediate representations to a geodesic barycenter, effectively whitening and recoloring feature maps in the frequency domain. The method generalizes InstanceNorm (recoverable at with identity recoloring) and acts as a drop-in layer that performs test-time domain adaptation without re-training. Large-scale experiments across 10 sleep datasets with about subjects demonstrate state-of-the-art performance and improved data efficiency, with PSDNorm ranking top in most settings and showing robustness across architectures like U-Sleep and CNNTransformer. PSDNorm thus offers a practical, data-efficient solution for domain shift in physiological signals and holds promise for broader biomedical applications beyond sleep staging.

Abstract

Distribution shift poses a significant challenge in machine learning, particularly in biomedical applications using data collected across different subjects, institutions, and recording devices, such as sleep data. While existing normalization layers, BatchNorm, LayerNorm and InstanceNorm, help mitigate distribution shifts, when applied over the time dimension they ignore the dependencies and auto-correlation inherent to the vector coefficients they normalize. In this paper, we propose PSDNorm that leverages Monge mapping and temporal context to normalize feature maps in deep learning models for signals. Notably, the proposed method operates as a test-time domain adaptation technique, addressing distribution shifts without additional training. Evaluations with architectures based on U-Net or transformer backbones trained on 10K subjects across 10 datasets, show that PSDNorm achieves state-of-the-art performance on unseen left-out datasets while being 4-times more data-efficient than BatchNorm.

Paper Structure

This paper contains 51 sections, 1 theorem, 15 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Proposition A.1

Let ${\boldsymbol\Sigma}^{(s)}$ and ${\boldsymbol\Sigma}^{(t)}$ be two covariance matrices in $\mathbb{R}^{cF \times cF}$ following eq:covariance_structure. Let us denote $\mathbf{P}^{(s)}$ and $\mathbf{P}^{(t)}$ the corresponding PSD matrices. The geodesic associated with the Bures-Wasserstein metr

Figures (7)

  • Figure 1: Description of normalization layers. The input shape is $(N, c, \ell)$ with batch size $N$, channels $c$, and signal length $\ell$. BatchNorm estimates the mean $\widehat{\mu}$ and variance $\widehat{\sigma}^2$ over batch and time, and learns parameters $(\gamma, \beta)$ to normalize the input. PSDNorm estimates PSDs $\widehat{\mathbf{P}}$ over time and accounts for local temporal correlations. It computes the barycenter PSD $\widehat{\overline{\mathbf{P}}}$, updates it via a running Riemannian barycenter \ref{['eq:running_barycenter_method']}, and applies the filter $\widehat{\mathbf{H}}$ to normalize the input. The hyperparameter $F$ controls the extent of temporal correlation considered, thereby adjusting the strength of the normalization.
  • Figure 2: Description of the running Riemanian barycenter. The barycenter of the batch $\widehat{\overline{\mathbf{P}}}_\mathcal{B}$ is estimated from the PSD of each batch sample. Then the running Riemanian barycenter is updated through an exponential average along the geodesic (- -), parameterized by $\alpha \in [0,1]$.
  • Figure 3: Critical Difference (CD) diagram for two architectures on datasets balanced @400. Average ranks across datasets and subjects for USleep and CNNTransformer. Black lines connect methods that are not significantly different.
  • Figure 4: Performance of PSDNorm and BatchNorm with varying training set sizes. The BACC score is plotted against the number of training subjects used with U-Sleep.
  • Figure 5: Subject-wise BACC comparison on MASS and CHAT (balanced @400). Blue dot means improvement with PSDNorm.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition A.1
  • proof