Adapting Neural Audio Codecs to EEG
Ard Kastrati, Luca Lanzendörfer, Riccardo Rigoni, John Staib Matilla, Roger Wattenhofer
TL;DR
This work demonstrates that pretrained neural audio codecs can effectively compress EEG by repurposing DAC with EEG-specific preprocessing and fine-tuning. It introduces a multi-channel extension (DAC-MC) to exploit cross-channel correlations, using attention and channel-conditioned decoding while maintaining audio-based initialization. Evaluations on TUAB and TUEP show that fine-tuned DAC achieves superior reconstruction and preserves clinically relevant information, with DAC-MC offering added gains for epilepsy detection. The study also maps key compression choices—codebook depth, vocabulary size, and sampling rate—to reconstruction quality and downstream task performance, outlining a practical path to discrete, scalable EEG representations.
Abstract
EEG and audio are inherently distinct modalities, differing in sampling rate, channel structure, and scale. Yet, we show that pretrained neural audio codecs can serve as effective starting points for EEG compression, provided that the data are preprocessed to be suitable to the codec's input constraints. Using DAC, a state-of-the-art neural audio codec as our base, we demonstrate that raw EEG can be mapped into the codec's stride-based framing, enabling direct reuse of the audio-pretrained encoder-decoder. Even without modification, this setup yields stable EEG reconstructions, and fine-tuning on EEG data further improves fidelity and generalization compared to training from scratch. We systematically explore compression-quality trade-offs by varying residual codebook depth, codebook (vocabulary) size, and input sampling rate. To capture spatial dependencies across electrodes, we propose DAC-MC, a multi-channel extension with attention-based cross-channel aggregation and channel-specific decoding, while retaining the audio-pretrained initialization. Evaluations on the TUH Abnormal and Epilepsy datasets show that the adapted codecs preserve clinically relevant information, as reflected in spectrogram-based reconstruction loss and downstream classification accuracy.
