Table of Contents
Fetching ...

NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation

Jingyang Huo, Yikai Wang, Xuelin Qian, Yun Wang, Chong Li, Jianfeng Feng, Yanwei Fu

TL;DR

NeuroPictor addresses fMRI-to-image reconstruction by directly modulating diffusion model generation with brain signals. It introduces a universal fMRI latent space learned via multi-subject pretraining and couples it with a High-Level Guiding Network for semantics and a Low-Level Manipulation Network for fine-grained structure, enabling end-to-end fMRI-to-image decoding. Trained on ~67,000 fMRI–image pairs across eight subjects, the method yields superior within-subject decoding and robust cross-subject transfer, outperforming several baselines in both semantic fidelity and structural detail. The work advances practical fMRI decoding with end-to-end diffusion conditioning and provides insights into multi-level brain-to-image mapping, with open-source code for reproducibility and adaptation.

Abstract

Recent fMRI-to-image approaches mainly focused on associating fMRI signals with specific conditions of pre-trained diffusion models. These approaches, while producing high-quality images, capture only a limited aspect of the complex information in fMRI signals and offer little detailed control over image creation. In contrast, this paper proposes to directly modulate the generation process of diffusion models using fMRI signals. Our approach, NeuroPictor, divides the fMRI-to-image process into three steps: i) fMRI calibrated-encoding, to tackle multi-individual pre-training for a shared latent space to minimize individual difference and enable the subsequent multi-subject training; ii) fMRI-to-image multi-subject pre-training, perceptually learning to guide diffusion model with high- and low-level conditions across different individuals; iii) fMRI-to-image single-subject refining, similar with step ii but focus on adapting to particular individual. NeuroPictor extracts high-level semantic features from fMRI signals that characterizing the visual stimulus and incrementally fine-tunes the diffusion model with a low-level manipulation network to provide precise structural instructions. By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity, particularly in the within-subject setting, as evidenced in benchmark datasets. Our code and model are available at https://jingyanghuo.github.io/neuropictor/.

NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation

TL;DR

NeuroPictor addresses fMRI-to-image reconstruction by directly modulating diffusion model generation with brain signals. It introduces a universal fMRI latent space learned via multi-subject pretraining and couples it with a High-Level Guiding Network for semantics and a Low-Level Manipulation Network for fine-grained structure, enabling end-to-end fMRI-to-image decoding. Trained on ~67,000 fMRI–image pairs across eight subjects, the method yields superior within-subject decoding and robust cross-subject transfer, outperforming several baselines in both semantic fidelity and structural detail. The work advances practical fMRI decoding with end-to-end diffusion conditioning and provides insights into multi-level brain-to-image mapping, with open-source code for reproducibility and adaptation.

Abstract

Recent fMRI-to-image approaches mainly focused on associating fMRI signals with specific conditions of pre-trained diffusion models. These approaches, while producing high-quality images, capture only a limited aspect of the complex information in fMRI signals and offer little detailed control over image creation. In contrast, this paper proposes to directly modulate the generation process of diffusion models using fMRI signals. Our approach, NeuroPictor, divides the fMRI-to-image process into three steps: i) fMRI calibrated-encoding, to tackle multi-individual pre-training for a shared latent space to minimize individual difference and enable the subsequent multi-subject training; ii) fMRI-to-image multi-subject pre-training, perceptually learning to guide diffusion model with high- and low-level conditions across different individuals; iii) fMRI-to-image single-subject refining, similar with step ii but focus on adapting to particular individual. NeuroPictor extracts high-level semantic features from fMRI signals that characterizing the visual stimulus and incrementally fine-tunes the diffusion model with a low-level manipulation network to provide precise structural instructions. By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity, particularly in the within-subject setting, as evidenced in benchmark datasets. Our code and model are available at https://jingyanghuo.github.io/neuropictor/.
Paper Structure (31 sections, 10 equations, 15 figures, 10 tables)

This paper contains 31 sections, 10 equations, 15 figures, 10 tables.

Figures (15)

  • Figure 1: NeuroPictor achieves precise control over decoding low-level structures from fMRI signals while preserving high-level semantics. The decoded images progress from reconstructing visual stimulus solely from high-level semantics to both high-level semantics and low-level structures as the influence increases from left to right.
  • Figure 2: NeuroPictor can swap high-level fMRI features to manipulate image semantics while maintaining structural consistency.
  • Figure 3: Our NeuroPictor framework is trained in three steps for fMRI-to-image decoding. i) the fMRI calibrated-encoding stage, which establishes a universal latent fMRI space across multiple individuals; ii) the fMRI-to-image multi-subject pre-training stage, which achieves multi-level modulation through perceptual learning. iii) the fMRI-to-image single-subject refining stage, using the same strategy in step ii but focuses on refinement on particular subject.
  • Figure 4: Framework of our High-Level Guiding Network and Low-Level Manipulation Network.
  • Figure 5: Qualitative comparision of our NeuroPictor and previous state-of-the-art methods. Compared with other methods, our NeuroPictor achieves both high-level semantics and low-level structures consistency.
  • ...and 10 more figures