Table of Contents
Fetching ...

Mind2Matter: Creating 3D Models from EEG Signals

Xia Deng, Shen Chen, Jiale Zhou, Lei Li

TL;DR

Mind2Matter tackles the challenge of reconstructing 3D scenes from EEG by introducing a two-stage framework that first translates EEG signals into descriptive text and then renders 3D scenes from that text using layout-guided 3D Gaussian splatting. The EEG-to-text stage employs a Graph Attention-based encoder, multi-scale temporal processing, and a partial fine-tuning scheme with adaptive-margin cross-modal loss to align EEG embeddings with image-language representations via a frozen LLM. The text-to-3D stage uses LLM-generated object layouts and SDS-guided optimization of anisotropic Gaussians under layout priors, enabling coherent multi-object scenes with diffusion priors. Experiments on an EEG-Image dataset demonstrate improved textual semantics and 3D fidelity over baselines, with ablations confirming the importance of the GA module, CAML, and label supervision. This work suggests a scalable, real-time pathway from neural signals to structured 3D outputs, with potential impact on BCIs, VR, and neuroprosthetics.

Abstract

The reconstruction of 3D objects from brain signals has gained significant attention in brain-computer interface (BCI) research. Current research predominantly utilizes functional magnetic resonance imaging (fMRI) for 3D reconstruction tasks due to its excellent spatial resolution. Nevertheless, the clinical utility of fMRI is limited by its prohibitive costs and inability to support real-time operations. In comparison, electroencephalography (EEG) presents distinct advantages as an affordable, non-invasive, and mobile solution for real-time brain-computer interaction systems. While recent advances in deep learning have enabled remarkable progress in image generation from neural data, decoding EEG signals into structured 3D representations remains largely unexplored. In this paper, we propose a novel framework that translates EEG recordings into 3D object reconstructions by leveraging neural decoding techniques and generative models. Our approach involves training an EEG encoder to extract spatiotemporal visual features, fine-tuning a large language model to interpret these features into descriptive multimodal outputs, and leveraging generative 3D Gaussians with layout-guided control to synthesize the final 3D structures. Experiments demonstrate that our model captures salient geometric and semantic features, paving the way for applications in brain-computer interfaces (BCIs), virtual reality, and neuroprosthetics. Our code is available in https://github.com/sddwwww/Mind2Matter.

Mind2Matter: Creating 3D Models from EEG Signals

TL;DR

Mind2Matter tackles the challenge of reconstructing 3D scenes from EEG by introducing a two-stage framework that first translates EEG signals into descriptive text and then renders 3D scenes from that text using layout-guided 3D Gaussian splatting. The EEG-to-text stage employs a Graph Attention-based encoder, multi-scale temporal processing, and a partial fine-tuning scheme with adaptive-margin cross-modal loss to align EEG embeddings with image-language representations via a frozen LLM. The text-to-3D stage uses LLM-generated object layouts and SDS-guided optimization of anisotropic Gaussians under layout priors, enabling coherent multi-object scenes with diffusion priors. Experiments on an EEG-Image dataset demonstrate improved textual semantics and 3D fidelity over baselines, with ablations confirming the importance of the GA module, CAML, and label supervision. This work suggests a scalable, real-time pathway from neural signals to structured 3D outputs, with potential impact on BCIs, VR, and neuroprosthetics.

Abstract

The reconstruction of 3D objects from brain signals has gained significant attention in brain-computer interface (BCI) research. Current research predominantly utilizes functional magnetic resonance imaging (fMRI) for 3D reconstruction tasks due to its excellent spatial resolution. Nevertheless, the clinical utility of fMRI is limited by its prohibitive costs and inability to support real-time operations. In comparison, electroencephalography (EEG) presents distinct advantages as an affordable, non-invasive, and mobile solution for real-time brain-computer interaction systems. While recent advances in deep learning have enabled remarkable progress in image generation from neural data, decoding EEG signals into structured 3D representations remains largely unexplored. In this paper, we propose a novel framework that translates EEG recordings into 3D object reconstructions by leveraging neural decoding techniques and generative models. Our approach involves training an EEG encoder to extract spatiotemporal visual features, fine-tuning a large language model to interpret these features into descriptive multimodal outputs, and leveraging generative 3D Gaussians with layout-guided control to synthesize the final 3D structures. Experiments demonstrate that our model captures salient geometric and semantic features, paving the way for applications in brain-computer interfaces (BCIs), virtual reality, and neuroprosthetics. Our code is available in https://github.com/sddwwww/Mind2Matter.

Paper Structure

This paper contains 20 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Architecture of Mind2Matter. EEG signals are processed by a trainable EEG Encoder to extract spatiotemporal features, generating EEG embeddings aligned with image embeddings from a frozen CLIP Encoder. These embeddings are transformed by a trainable Mapping Network and fed into a frozen LLM, which generates a textual description (e.g., "A colorful butterfly is perched on a flower") using a prompt. The text is then used by another LLM to create an initial 3D layout, followed by object-level and scene-level optimization with 3D Gaussian splatting and diffusion priors, producing a high-fidelity 3D scene.
  • Figure 2: Architecture of the EEG Encoder
  • Figure 3: Qualitative comparison of EEG-to-3D generation results. Comparison between Mind2Matter and baseline methods (DreamGaussian, GraphDreamer, and GSGEN) on four representative visual stimuli. Each row shows: (top) the original visual stimulus with reference description; (middle) the text description generated from EEG signals; (bottom) the final 3D reconstruction results from different methods. Mind2Matter demonstrates superior performance in both semantic preservation and geometric fidelity, while baseline methods exhibit various artifacts such as duplicated components or structural distortions.
  • Figure 4: Comparison of 3D Reconstruction Results for EEG-Text-3D and EEG-Image-3D.Top: EEG-Image-3D results show good frontal view but distorted side views. Bottom: Our EEG-Text-3D method maintains consistent quality from all angles.
  • Figure 5: Additional EEG-to-Text Generation Results.