Table of Contents
Fetching ...

MinD-3D: Reconstruct High-quality 3D objects in Human Brain

Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu

TL;DR

Recon3DMind tackles the challenge of reconstructing 3D objects from human brain activity. It introduces the fMRI-Shape dataset and a three-stage MinD-3D framework that fuses multi-frame fMRI signals with diffusion-based visual feature generation and a latent-adapted 3D mesh decoder. The approach demonstrates strong semantic and structural reconstruction performance and reveals correlations between learned features and visual ROIs, highlighting the brain's 3D processing capabilities. This work paves the way for bridging cognitive neuroscience and 3D computer vision, enabling more faithful interpretations of human 3D perception and offering a new benchmark with cross-subject generalization tests.

Abstract

In this paper, we introduce Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in the fields of cognitive neuroscience and computer vision. To support this pioneering task, we present the fMRI-Shape dataset, which includes data from 14 participants and features 360-degree videos of 3D objects to enable comprehensive fMRI signal capture across various settings, thereby laying a foundation for future research. Furthermore, we propose MinD-3D, a novel and effective three-stage framework specifically designed to decode the brain's 3D visual information from fMRI signals, demonstrating the feasibility of this challenging task. The framework begins by extracting and aggregating features from fMRI frames through a neuro-fusion encoder, subsequently employs a feature bridge diffusion model to generate visual features, and ultimately recovers the 3D object via a generative transformer decoder. We assess the performance of MinD-3D using a suite of semantic and structural metrics and analyze the correlation between the features extracted by our model and the visual regions of interest (ROIs) in fMRI signals. Our findings indicate that MinD-3D not only reconstructs 3D objects with high semantic relevance and spatial similarity but also significantly enhances our understanding of the human brain's capabilities in processing 3D visual information. Project page at: https://jianxgao.github.io/MinD-3D.

MinD-3D: Reconstruct High-quality 3D objects in Human Brain

TL;DR

Recon3DMind tackles the challenge of reconstructing 3D objects from human brain activity. It introduces the fMRI-Shape dataset and a three-stage MinD-3D framework that fuses multi-frame fMRI signals with diffusion-based visual feature generation and a latent-adapted 3D mesh decoder. The approach demonstrates strong semantic and structural reconstruction performance and reveals correlations between learned features and visual ROIs, highlighting the brain's 3D processing capabilities. This work paves the way for bridging cognitive neuroscience and 3D computer vision, enabling more faithful interpretations of human 3D perception and offering a new benchmark with cross-subject generalization tests.

Abstract

In this paper, we introduce Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in the fields of cognitive neuroscience and computer vision. To support this pioneering task, we present the fMRI-Shape dataset, which includes data from 14 participants and features 360-degree videos of 3D objects to enable comprehensive fMRI signal capture across various settings, thereby laying a foundation for future research. Furthermore, we propose MinD-3D, a novel and effective three-stage framework specifically designed to decode the brain's 3D visual information from fMRI signals, demonstrating the feasibility of this challenging task. The framework begins by extracting and aggregating features from fMRI frames through a neuro-fusion encoder, subsequently employs a feature bridge diffusion model to generate visual features, and ultimately recovers the 3D object via a generative transformer decoder. We assess the performance of MinD-3D using a suite of semantic and structural metrics and analyze the correlation between the features extracted by our model and the visual regions of interest (ROIs) in fMRI signals. Our findings indicate that MinD-3D not only reconstructs 3D objects with high semantic relevance and spatial similarity but also significantly enhances our understanding of the human brain's capabilities in processing 3D visual information. Project page at: https://jianxgao.github.io/MinD-3D.
Paper Structure (34 sections, 10 equations, 13 figures, 6 tables)

This paper contains 34 sections, 10 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Overview of Recon3DMind task, showcasing the fMRI-Shape dataset collection process with 14 participants observing 360-degree view videos of 3D objects, and MinD-3D framework for reconstructing 3D objects from fMRI signals.
  • Figure 2: Comparing fMRI-Shape with other 2D fMRI datasets. As the first 3D fMRI dataset, fMRI-Shape features a larger number of participants and frames, providing ample support for experiments in our proposed novel task and further research.
  • Figure 3: Overview of the fMRI-Shape Acquisition Process. Initially, we render each object into an 8-second long video, showcasing a 360-degree view. Subsequent fMRI signal capture is performed in video format, followed by data processing with fMRIPrep to convert signals from 32k_fs_LR surface space into 2D images of dimensions 1023 $\times$ 2514. Individual differences observed in the dataset, as highlighted in the middle part, underscore the challenges in generalizing these findings. On the rights, regions of interest (ROIs) are transformed into 256 $\times$ 256 image.
  • Figure 4: Overview of the MinD-3D Framework. Our approach combines a Neuro-Fusion Encoder for extracting features from fMRI frames, a Feature Bridge Diffusion Model for generating visual features from these fMRI signals, and a Latent Adapted Decoder based on the Argus 3D shape generator for reconstructing 3D objects. This integrated system effectively aligns and translates brain signals into accurate 3D visual representations. Note that the CLIP encoder is only for training the model, while not used for inference.
  • Figure 5: The qualitative results generated by LEA-3D, fMRI-PTE-3D, and our method are presented. GT indicates the ground-truth 3D objects. All the objects have been rendered into a 2D format.
  • ...and 8 more figures