Brain3D: Generating 3D Objects from fMRI

Yuankun Yang; Li Zhang; Ziyang Xie; Zhiyuan Yuan; Jianfeng Feng; Xiatian Zhu; Yu-Gang Jiang

Brain3D: Generating 3D Objects from fMRI

Yuankun Yang, Li Zhang, Ziyang Xie, Zhiyuan Yuan, Jianfeng Feng, Xiatian Zhu, Yu-Gang Jiang

TL;DR

Brain3D tackles the challenge of decoding 3D visual representations from brain activity by translating fMRI signals into multi-level, semantically rich embeddings and feeding them into a diffusion-based 3D generator. The method employs a dual-encoder architecture (low-level visual cortex and high-level semantic regions), a UMAP-based stabilization of high-level signals, and a two-stage generation pipeline that combines NeRF-based perceptual synthesis with DMTet-based semantic meshing, all guided by Score Distillation Sampling. Extensive experiments on NSD and GOD datasets demonstrate superior 3D generation quality and reveal distinct yet cooperative roles of visual regions (notably V1) and the medial temporal lobe, with preliminary clinical evaluations suggesting potential in regional disorder diagnosis. The work advances cross-modality neural decoding by linking fMRI to 3D objects, offering both neuroscientific insight into the visual system and practical tools for clinical fMRI evaluation and diagnosis.

Abstract

Understanding the hidden mechanisms behind human's visual perception is a fundamental question in neuroscience. To that end, investigating into the neural responses of human mind activities, such as functional Magnetic Resonance Imaging (fMRI), has been a significant research vehicle. However, analyzing fMRI signals is challenging, costly, daunting, and demanding for professional training. Despite remarkable progress in fMRI analysis, existing approaches are limited to generating 2D images and far away from being biologically meaningful and practically useful. Under this insight, we propose to generate visually plausible and functionally more comprehensive 3D outputs decoded from brain signals, enabling more sophisticated modeling of fMRI data. Conceptually, we reformulate this task as a {\em fMRI conditioned 3D object generation} problem. We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject who was presented with a 2D image, and yields as output the corresponding 3D object images. The key capabilities of this model include tackling the noises with high-level semantic signals and a two-stage architecture design for progressive high-level information integration. Extensive experiments validate the superior capability of our model over previous state-of-the-art 3D object generation methods. Importantly, we show that our model captures the distinct functionalities of each region of human vision system as well as their intricate interplay relationships, aligning remarkably with the established discoveries in neuroscience. Further, preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios, such as V1, V2, V3, V4, and the medial temporal lobe (MTL) within the human visual system. Our data and code will be available at https://brain-3d.github.io/.

Brain3D: Generating 3D Objects from fMRI

TL;DR

Abstract

Paper Structure (17 sections, 19 equations, 13 figures)

This paper contains 17 sections, 19 equations, 13 figures.

Introduction
Related work
Method
Overview
Preliminary: Diffusion based 3D generation
Tackling the noises with high-level semantics
Two-stage 3D generation
Training and inference
Experiments
Dataset
Evaluation metrics
Qualitative evaluation
Brain region functionalities
Clinical evaluation
Ablation analysis
...and 2 more sections

Figures (13)

Figure 1: Brain3D: Generating 3D objects from brain fMRI signals. A subject's visual system first precepts object stimulus, triggering specific patterns of activity in the brain's visual processing areas which is then captured and expressed by fMRI data. Brain3D then leverages a fMRI encoder to extract the feature, feeds it into a 2D diffusion model for detailed 3D generation. The training procedure involves optimizing low-level and high-level encoders with the feature distillation objective $L_{image}$ and $L_{text}$ respectively, each incorporating bidirectional CLIP Radford2021clip contrastive learning and MixUp Zhang2017mixup augmentation for enhanced generalization. By contrasting the object stimulus and generated 3D visuals, our approach offers the potential for exploring brain region functionalities and aiding in the diagnosis of neurological conditions.
Figure 2: Overview of our Brain3D that decodes the functional MRI (fMRI) signals into 3D reconstructions. (a) Initially, the subject's visual system precepts the object stimulus, which triggers specific patterns of neural activity within the brain's visual processing areas. These patterns are recorded as fMRI signals. (b) To generate the 3D object, Brain3D employs two specialized encoders: the high-level encoder uses MLP and UMAP mcinnes2018umap to capture abstract, high-level visual concepts, while the low-level encoder processes basic visual details via Versatile diffusion Xu2023. (c) Both encoders process fMRI to extract respective information for representing 3D objects. (d) These encoders are optimized through feature distillation, which utilize pretrained CLIP-Text and CLIP-Image encoders Radford2021clip as the guidance for the high-level and low-level features, respectively. (e) The 3D object generation process includes two phases: the perceptual phase utilizes NeRF Ben2020NeRF, and the semantic phase transforms NeRF output into a 3D mesh Hoppe1993mesh using DMTet Shen2021dmtet. The 3D object is refined using Score Distillation Sampling (SDS) poole2022dreamfusion, which introduces Gaussian noise into a randomly-viewed rendered image and employs 2D diffusion models for noise prediction. The high-level and low-level feature guidance both are served as conditions for the pretrained Stable Diffusion rombach2022high and Zero-1-to-3 liu2023zero-1-to-3 models to predict the added noise.
Figure 3: Variety in high-level information. (a) The reference images presented to the participants. (b) Images generated by Stable Diffusionrombach2022high conditioned on the high-level embedding without UMAPmcinnes2018umap projection. (c, d) Images conditioned on different high-level embeddings after UMAP projection. It shows that the high-level embedding directly extracted from fMRI comes with high diversity and noise. Our UMAP projection can significantly mitigate this challenge.
Figure 4: Brain3D generates finer 3D visuals from fMRI under 2D stimuli compared with previous methods. (a) Evaluation on the Generic Object Decoding (GOD) horikawa2017generic dataset. The first column is the 2D stimuli displayed to the subjects. The second, third, and fourth columns exhibit the 2D reconstruction from MinD-Vis chen2023seeing, IC-GAN ozcelik2022reconstruction, and Gaziv gaziv2022self. The last columns display the multi-view visualization of our generated 3D visuals from fMRI. Our method not only achieves the best quality of fMRI-based generation, but also uniquely infers 3D geometry. (b) Evaluation on the Natural Scenes Dataset (NSD) allen2022massive. For each case, the first column displays the 2D stimuli presented to the subjects. The second column presents 2D reconstruction from MindEye Scotti2024. The three and fourth column exhibits two distinct views of our 3D visuals generated from fMRI, followed by the final column depicting the surface normals of our generation, yielding higher quality output than MindEye Scotti2024. Zoomed-in visualizations in the last two rows demonstrate our better quality and more accurate fMRI-based generation.
Figure 5: Collaborative property of the left and right hemispheres with Brain3D. (a) To examine the difference and interaction between the human brain's two hemispheres, we conduct specialized experiments assessing 3D generation quality using fMRI from either hemisphere or both. (b) The function of the left and right hemispheres varies significantly for different specific objects under various metrics. In general, the left favors finer details and intricate structures, while the right focuses more on overall shape and silhouette. (c) Notably, using a combination of data from both hemispheres led to an enhanced generation. This synergistic effect was evident in improved performance across various metrics.
...and 8 more figures

Brain3D: Generating 3D Objects from fMRI

TL;DR

Abstract

Brain3D: Generating 3D Objects from fMRI

Authors

TL;DR

Abstract

Table of Contents

Figures (13)