Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Luca Cosmo, Taketo Akama
TL;DR
This work tackles reconstructing naturalistic music from non-invasive EEG signals using latent diffusion models conditioned by ControlNet adapters. The proposed EEG-conditioned pipeline builds on AudioLDM2, applying a lightweight projector to align EEG data with the diffusion model's latent space while keeping pre-processing minimal. Evaluation relies on neural-embedding-based metrics (CLAP and EnCodec) and Fréchet distances to capture semantic audio attributes despite low EEG temporal resolution, reporting improvements over a convolutional baseline and showing promising performance on held-out tracks. The study demonstrates the feasibility of non-invasive brain-to-audio reconstruction for complex musical stimuli and points to the need for larger datasets and further methodological refinements to enhance distributional generalization and real-time applicability.
Abstract
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.
