Table of Contents
Fetching ...

MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding

Yuxiang Wei, Yanteng Zhang, Xi Xiao, Tianyang Wang, Xiao Wang, Vince D. Calhoun

TL;DR

MoRE-Brain introduces a neuro-inspired two-stage framework for fMRI visual decoding that uses a hierarchical Mixture-of-Experts encoder with $L=4$ levels (yielding $2^L=16$ experts) to map voxel activity to CLIP embeddings, and a diffusion-based image generator guided by Time and Space routers. It enables cross-subject generalization by freezing expert weights and only training subject-specific routers, reducing data requirements. The dual routing provides interpretability by revealing temporal and spatial contributions of different brain networks to semantic and spatial image features, supported by bottleneck analyses and ICA/GradientSHAP attributions. On NSD, it achieves competitive reconstruction fidelity while offering richer neuroscientific insights, representing a step toward generalizable and interpretable brain decoding with publicly available code.

Abstract

Decoding visual experiences from fMRI offers a powerful avenue to understand human perception and develop advanced brain-computer interfaces. However, current progress often prioritizes maximizing reconstruction fidelity while overlooking interpretability, an essential aspect for deriving neuroscientific insight. To address this gap, we propose MoRE-Brain, a neuro-inspired framework designed for high-fidelity, adaptable, and interpretable visual reconstruction. MoRE-Brain uniquely employs a hierarchical Mixture-of-Experts architecture where distinct experts process fMRI signals from functionally related voxel groups, mimicking specialized brain networks. The experts are first trained to encode fMRI into the frozen CLIP space. A finetuned diffusion model then synthesizes images, guided by expert outputs through a novel dual-stage routing mechanism that dynamically weighs expert contributions across the diffusion process. MoRE-Brain offers three main advancements: First, it introduces a novel Mixture-of-Experts architecture grounded in brain network principles for neuro-decoding. Second, it achieves efficient cross-subject generalization by sharing core expert networks while adapting only subject-specific routers. Third, it provides enhanced mechanistic insight, as the explicit routing reveals precisely how different modeled brain regions shape the semantic and spatial attributes of the reconstructed image. Extensive experiments validate MoRE-Brain's high reconstruction fidelity, with bottleneck analyses further demonstrating its effective utilization of fMRI signals, distinguishing genuine neural decoding from over-reliance on generative priors. Consequently, MoRE-Brain marks a substantial advance towards more generalizable and interpretable fMRI-based visual decoding. Code will be publicly available soon: https://github.com/yuxiangwei0808/MoRE-Brain.

MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding

TL;DR

MoRE-Brain introduces a neuro-inspired two-stage framework for fMRI visual decoding that uses a hierarchical Mixture-of-Experts encoder with levels (yielding experts) to map voxel activity to CLIP embeddings, and a diffusion-based image generator guided by Time and Space routers. It enables cross-subject generalization by freezing expert weights and only training subject-specific routers, reducing data requirements. The dual routing provides interpretability by revealing temporal and spatial contributions of different brain networks to semantic and spatial image features, supported by bottleneck analyses and ICA/GradientSHAP attributions. On NSD, it achieves competitive reconstruction fidelity while offering richer neuroscientific insights, representing a step toward generalizable and interpretable brain decoding with publicly available code.

Abstract

Decoding visual experiences from fMRI offers a powerful avenue to understand human perception and develop advanced brain-computer interfaces. However, current progress often prioritizes maximizing reconstruction fidelity while overlooking interpretability, an essential aspect for deriving neuroscientific insight. To address this gap, we propose MoRE-Brain, a neuro-inspired framework designed for high-fidelity, adaptable, and interpretable visual reconstruction. MoRE-Brain uniquely employs a hierarchical Mixture-of-Experts architecture where distinct experts process fMRI signals from functionally related voxel groups, mimicking specialized brain networks. The experts are first trained to encode fMRI into the frozen CLIP space. A finetuned diffusion model then synthesizes images, guided by expert outputs through a novel dual-stage routing mechanism that dynamically weighs expert contributions across the diffusion process. MoRE-Brain offers three main advancements: First, it introduces a novel Mixture-of-Experts architecture grounded in brain network principles for neuro-decoding. Second, it achieves efficient cross-subject generalization by sharing core expert networks while adapting only subject-specific routers. Third, it provides enhanced mechanistic insight, as the explicit routing reveals precisely how different modeled brain regions shape the semantic and spatial attributes of the reconstructed image. Extensive experiments validate MoRE-Brain's high reconstruction fidelity, with bottleneck analyses further demonstrating its effective utilization of fMRI signals, distinguishing genuine neural decoding from over-reliance on generative priors. Consequently, MoRE-Brain marks a substantial advance towards more generalizable and interpretable fMRI-based visual decoding. Code will be publicly available soon: https://github.com/yuxiangwei0808/MoRE-Brain.

Paper Structure

This paper contains 38 sections, 9 equations, 23 figures, 4 tables.

Figures (23)

  • Figure 1: Overview of MoRE-Brain: fMRI is encoded into CLIP space by the hierarchical MoE (left), guiding image generation via dynamic Time/Space Routers (right)
  • Figure 2: Routing mechanism of MoRE-Brain. (a) Hierarchical fMRI MoE encoder processes voxels using routed, specialized experts across levels. (b) Time and space router that adaptively selects and modulates expert outputs
  • Figure 3: Reconstructions from different methods on the subject 1. More-Brain can reconstruct visual stimuli faithfully compared to baselines. An optional refinement step (see Appendix \ref{['appendix:inference']}) is employed to enhance the image.
  • Figure 4: Quantitative performance across varying bottleneck sizes. MoRE-Brain shows a consistent and notable performance drop across most metrics as the bottleneck size decreases. Moreover, its performance witnesses the most significant drop after adding a bottleneck, indicating that the bottleneck is particularly detrimental to the learned rich information and the model heavily relies on the fMRI data. However, MindEye2 and MindBridge exhibit less degradation, particularly MindBridge on SSIM and InceptionV3, suggesting a greater influence of learned priors. Notably, MindEye2 maintains high CLIP-Cos even at extreme bottlenecks, while others show expected decline with information restriction.
  • Figure 5: Interpretability of overall model contributions via ICA. Brain regions consistently contributing to reconstructions across 8 subjects are identified as Independent Components (ICs). ICs highlight engagement of known visual processing areas (e.g., visual central/periphereal) and higher-order association areas (e.g., dorsal attention network). This demonstrates MoRE-Brain learns neurophysiologically plausible mappings. See Appendix \ref{['appendix:overall_viz']} for complete visualizations of all ICs and ROI labels.
  • ...and 18 more figures