Table of Contents
Fetching ...

Adapting Neural Audio Codecs to EEG

Ard Kastrati, Luca Lanzendörfer, Riccardo Rigoni, John Staib Matilla, Roger Wattenhofer

TL;DR

This work demonstrates that pretrained neural audio codecs can effectively compress EEG by repurposing DAC with EEG-specific preprocessing and fine-tuning. It introduces a multi-channel extension (DAC-MC) to exploit cross-channel correlations, using attention and channel-conditioned decoding while maintaining audio-based initialization. Evaluations on TUAB and TUEP show that fine-tuned DAC achieves superior reconstruction and preserves clinically relevant information, with DAC-MC offering added gains for epilepsy detection. The study also maps key compression choices—codebook depth, vocabulary size, and sampling rate—to reconstruction quality and downstream task performance, outlining a practical path to discrete, scalable EEG representations.

Abstract

EEG and audio are inherently distinct modalities, differing in sampling rate, channel structure, and scale. Yet, we show that pretrained neural audio codecs can serve as effective starting points for EEG compression, provided that the data are preprocessed to be suitable to the codec's input constraints. Using DAC, a state-of-the-art neural audio codec as our base, we demonstrate that raw EEG can be mapped into the codec's stride-based framing, enabling direct reuse of the audio-pretrained encoder-decoder. Even without modification, this setup yields stable EEG reconstructions, and fine-tuning on EEG data further improves fidelity and generalization compared to training from scratch. We systematically explore compression-quality trade-offs by varying residual codebook depth, codebook (vocabulary) size, and input sampling rate. To capture spatial dependencies across electrodes, we propose DAC-MC, a multi-channel extension with attention-based cross-channel aggregation and channel-specific decoding, while retaining the audio-pretrained initialization. Evaluations on the TUH Abnormal and Epilepsy datasets show that the adapted codecs preserve clinically relevant information, as reflected in spectrogram-based reconstruction loss and downstream classification accuracy.

Adapting Neural Audio Codecs to EEG

TL;DR

This work demonstrates that pretrained neural audio codecs can effectively compress EEG by repurposing DAC with EEG-specific preprocessing and fine-tuning. It introduces a multi-channel extension (DAC-MC) to exploit cross-channel correlations, using attention and channel-conditioned decoding while maintaining audio-based initialization. Evaluations on TUAB and TUEP show that fine-tuned DAC achieves superior reconstruction and preserves clinically relevant information, with DAC-MC offering added gains for epilepsy detection. The study also maps key compression choices—codebook depth, vocabulary size, and sampling rate—to reconstruction quality and downstream task performance, outlining a practical path to discrete, scalable EEG representations.

Abstract

EEG and audio are inherently distinct modalities, differing in sampling rate, channel structure, and scale. Yet, we show that pretrained neural audio codecs can serve as effective starting points for EEG compression, provided that the data are preprocessed to be suitable to the codec's input constraints. Using DAC, a state-of-the-art neural audio codec as our base, we demonstrate that raw EEG can be mapped into the codec's stride-based framing, enabling direct reuse of the audio-pretrained encoder-decoder. Even without modification, this setup yields stable EEG reconstructions, and fine-tuning on EEG data further improves fidelity and generalization compared to training from scratch. We systematically explore compression-quality trade-offs by varying residual codebook depth, codebook (vocabulary) size, and input sampling rate. To capture spatial dependencies across electrodes, we propose DAC-MC, a multi-channel extension with attention-based cross-channel aggregation and channel-specific decoding, while retaining the audio-pretrained initialization. Evaluations on the TUH Abnormal and Epilepsy datasets show that the adapted codecs preserve clinically relevant information, as reflected in spectrogram-based reconstruction loss and downstream classification accuracy.

Paper Structure

This paper contains 21 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: DAC-MC. Purple modules form the pretrained DAC backbone.
  • Figure 2: Example reconstruction with audio-pretrained codec and fine-tuned codec with EEG data.
  • Figure 3: Comparison of EEG codec adaptations. (a) Losses for different training strategies. (b) Trade-offs with residual codebooks. (c) Sampling rate and alphabet size effects.
  • Figure 4: Random subset of 4 channels with pivot Cz on the 10-20 System. On the left, the induced probability distribution from pivot Cz. On the right, the sampled channels.. Adapted fromwikipedia_1020_2010. Public domain.
  • Figure 5: Example reconstruction with fine-tuned codec with multi-channels.