Mind2Matter: Creating 3D Models from EEG Signals
Xia Deng, Shen Chen, Jiale Zhou, Lei Li
TL;DR
Mind2Matter tackles the challenge of reconstructing 3D scenes from EEG by introducing a two-stage framework that first translates EEG signals into descriptive text and then renders 3D scenes from that text using layout-guided 3D Gaussian splatting. The EEG-to-text stage employs a Graph Attention-based encoder, multi-scale temporal processing, and a partial fine-tuning scheme with adaptive-margin cross-modal loss to align EEG embeddings with image-language representations via a frozen LLM. The text-to-3D stage uses LLM-generated object layouts and SDS-guided optimization of anisotropic Gaussians under layout priors, enabling coherent multi-object scenes with diffusion priors. Experiments on an EEG-Image dataset demonstrate improved textual semantics and 3D fidelity over baselines, with ablations confirming the importance of the GA module, CAML, and label supervision. This work suggests a scalable, real-time pathway from neural signals to structured 3D outputs, with potential impact on BCIs, VR, and neuroprosthetics.
Abstract
The reconstruction of 3D objects from brain signals has gained significant attention in brain-computer interface (BCI) research. Current research predominantly utilizes functional magnetic resonance imaging (fMRI) for 3D reconstruction tasks due to its excellent spatial resolution. Nevertheless, the clinical utility of fMRI is limited by its prohibitive costs and inability to support real-time operations. In comparison, electroencephalography (EEG) presents distinct advantages as an affordable, non-invasive, and mobile solution for real-time brain-computer interaction systems. While recent advances in deep learning have enabled remarkable progress in image generation from neural data, decoding EEG signals into structured 3D representations remains largely unexplored. In this paper, we propose a novel framework that translates EEG recordings into 3D object reconstructions by leveraging neural decoding techniques and generative models. Our approach involves training an EEG encoder to extract spatiotemporal visual features, fine-tuning a large language model to interpret these features into descriptive multimodal outputs, and leveraging generative 3D Gaussians with layout-guided control to synthesize the final 3D structures. Experiments demonstrate that our model captures salient geometric and semantic features, paving the way for applications in brain-computer interfaces (BCIs), virtual reality, and neuroprosthetics. Our code is available in https://github.com/sddwwww/Mind2Matter.
