Table of Contents
Fetching ...

ZUNA: Flexible EEG Superresolution with Position-Aware Diffusion Autoencoders

Christopher Warner, Jonas Mago, JR Huml, Mohamed Osman, Beren Millidge

TL;DR

ZUNA's performance substantially improves over ubiquitous spherical-spline interpolation methods, with the gap widening at higher dropout rates, compared to other deep learning methods in this space.

Abstract

We present \texttt{ZUNA}, a 380M-parameter masked diffusion autoencoder trained to perform masked channel infilling and superresolution for arbitrary electrode numbers and positions in EEG signals. The \texttt{ZUNA} architecture tokenizes multichannel EEG into short temporal windows and injects spatiotemporal structure via a 4D rotary positional encoding over (x,y,z,t), enabling inference on arbitrary channel subsets and positions. We train ZUNA on an aggregated and harmonized corpus spanning 208 public datasets containing approximately 2 million channel-hours using a combined reconstruction and heavy channel-dropout objective. We show that \texttt{ZUNA} substantially improves over ubiquitous spherical-spline interpolation methods, with the gap widening at higher dropout rates. Crucially, compared to other deep learning methods in this space, \texttt{ZUNA}'s performance \emph{generalizes} across datasets and channel positions allowing it to be applied directly to novel datasets and problems. Despite its generative capabilities, \texttt{ZUNA} remains computationally practical for deployment. We release Apache-2.0 weights and an MNE-compatible preprocessing/inference stack to encourage reproducible comparisons and downstream use in EEG analysis pipelines.

ZUNA: Flexible EEG Superresolution with Position-Aware Diffusion Autoencoders

TL;DR

ZUNA's performance substantially improves over ubiquitous spherical-spline interpolation methods, with the gap widening at higher dropout rates, compared to other deep learning methods in this space.

Abstract

We present \texttt{ZUNA}, a 380M-parameter masked diffusion autoencoder trained to perform masked channel infilling and superresolution for arbitrary electrode numbers and positions in EEG signals. The \texttt{ZUNA} architecture tokenizes multichannel EEG into short temporal windows and injects spatiotemporal structure via a 4D rotary positional encoding over (x,y,z,t), enabling inference on arbitrary channel subsets and positions. We train ZUNA on an aggregated and harmonized corpus spanning 208 public datasets containing approximately 2 million channel-hours using a combined reconstruction and heavy channel-dropout objective. We show that \texttt{ZUNA} substantially improves over ubiquitous spherical-spline interpolation methods, with the gap widening at higher dropout rates. Crucially, compared to other deep learning methods in this space, \texttt{ZUNA}'s performance \emph{generalizes} across datasets and channel positions allowing it to be applied directly to novel datasets and problems. Despite its generative capabilities, \texttt{ZUNA} remains computationally practical for deployment. We release Apache-2.0 weights and an MNE-compatible preprocessing/inference stack to encourage reproducible comparisons and downstream use in EEG analysis pipelines.
Paper Structure (14 sections, 8 equations, 5 figures)

This paper contains 14 sections, 8 equations, 5 figures.

Figures (5)

  • Figure 1: Examples of dropped-out EEG signals and their reconstruction by ZUNA and MNE spherical-spline interpolation. Green: ground-truth signal dropped-out (not presented to the model). Red: ZUNA model reconstruction. Magenta: MNE baseline spherical-spline interpolation.
  • Figure 2: EEG channel configurations and spatial coverage in the training corpus.(a) Distribution of retained EEG channel counts across all non-overlapping 5-second epochs after preprocessing, comprising approximately 2 million channel-hours with up to 256 channels per sample. The majority of samples have between 16–32 channels, with additional peaks at higher-density configurations (64, 128, and 256 channels). The x- and y-axis is shown on a logarithmic scale. (b) Three-dimensional Cartesian $(x,y,z)$ scalp coordinates of EEG electrode positions for a representative subset of 64 electrodes from the standard 10–20 montage, illustrating the spatial geometry used for conditioning the model.
  • Figure 3: Diffusion Autoencoder EEG Model Architecture
  • Figure 4: Normalized mean squared error (NMSE) for channel reconstruction as a function of channel dropout rate. Performance is shown for ZUNA (green) and spherical-spline interpolation (purple) under increasing levels of channel removal, corresponding to progressively more aggressive upsampling regimes. Results are reported for the training corpus (mixed channel configurations), a held-out validation set with a fixed 32-channel montage, the ANPHY-Sleep dataset (83 channels), Berlin BCI Competition III Dataset V (32 channels), the BCI2000 motor-imagery dataset (64 channels), and the ultra–high-density AAD dataset (255 channels). Across all datasets, reconstruction error increases with dropout rate for both methods, but ZUNA consistently outperforms spherical-spline interpolation, with the performance gap widening at higher dropout levels. Notably, the difference between methods is smallest for high-density recordings (e.g., 255 channels), where spatial interpolation is less challenging, whereas for lower-density montages and higher dropout rates, ZUNA maintains relatively stable performance while spherical-spline interpolation degrades substantially.
  • Figure 5: Full sample reconstruction of a 22-channel sample from Figure \ref{['fig:Reconstructions']}