Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

Emilian Postolache; Natalia Polouliakh; Hiroaki Kitano; Akima Connelly; Emanuele Rodolà; Luca Cosmo; Taketo Akama

Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Luca Cosmo, Taketo Akama

TL;DR

This work tackles reconstructing naturalistic music from non-invasive EEG signals using latent diffusion models conditioned by ControlNet adapters. The proposed EEG-conditioned pipeline builds on AudioLDM2, applying a lightweight projector to align EEG data with the diffusion model's latent space while keeping pre-processing minimal. Evaluation relies on neural-embedding-based metrics (CLAP and EnCodec) and Fréchet distances to capture semantic audio attributes despite low EEG temporal resolution, reporting improvements over a convolutional baseline and showing promising performance on held-out tracks. The study demonstrates the feasibility of non-invasive brain-to-audio reconstruction for complex musical stimuli and points to the need for larger datasets and further methodological refinements to enhance distributional generalization and real-time applicability.

Abstract

In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.

Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

TL;DR

Abstract

Paper Structure (12 sections, 5 equations, 3 figures, 1 table)

This paper contains 12 sections, 5 equations, 3 figures, 1 table.

Introduction
Background
Latent Diffusion Models
ControlNet
Method
EEG-Conditioned ControlNet
Neural Embedding-Based Metrics
Experimental Setup
Dataset
Implementation Details
Experiments
Conclusions

Figures (3)

Figure 1: Illustration of proposed method. We use ControlNet for conditioning a diffusion model on EEG data, in order to decode high-quality naturalistic music.
Figure 2: Qualitative results of our method. On the left ground truth, musical chunks. In the middle, reconstructions obtained via a baseline ConvNet. On the right, decodings obtained by our method. Notice how our method better matches the real tracks.
Figure 3: Cross-CLAP scores between decoded and ground truth tracks. Left: Our method (ControlNet-2). Right: Baseline convolutional network. Notice how the matrices are closer to diagonal with our method, indicating higher correlation (as measured by CLAP score) between decoded and ground truth tracks.

Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

TL;DR

Abstract

Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)