Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing
Xuanhua Yin, Runkai Zhao, Weidong Cai
TL;DR
This paper tackles multimodal brain encoding under substantial inter-subject variability by introducing AFIRE, a fusion-agnostic interface that standardizes time-aligned post-fusion tokens, and MIND, a sparse, subject-aware Mixture-of-Experts decoder powered by SADGate. The approach decouples fusion from decoding and enables plug-and-play integration with diverse backbones, achieving end-to-end whole-brain predictions. Empirical results on the Algonauts 2025 benchmark show consistent improvements across backbones (TRIBE, ImageBind, Qwen2.5-Omni) and subjects, with stronger cross-subject generalization and interpretable expert routing patterns. These findings suggest that fusion-agnostic interfaces combined with subject-aware MoE decoders offer robust, personalized, and scalable improvements for naturalistic neuroimaging encoding tasks.
Abstract
Naturalistic fMRI encoding must handle multimodal inputs, shifting fusion styles, and pronounced inter-subject variability. We introduce AFIRE (Agnostic Framework for Multimodal fMRI Response Encoding), an agnostic interface that standardizes time-aligned post-fusion tokens from varied encoders, and MIND, a plug-and-play Mixture-of-Experts decoder with a subject-aware dynamic gating. Trained end-to-end for whole-brain prediction, AFIRE decouples the decoder from upstream fusion, while MIND combines token-dependent Top-K sparse routing with a subject prior to personalize expert usage without sacrificing generality. Experiments across multiple multimodal backbones and subjects show consistent improvements over strong baselines, enhanced cross-subject generalization, and interpretable expert patterns that correlate with content type. The framework offers a simple attachment point for new encoders and datasets, enabling robust, plug-and-improve performance for naturalistic neuroimaging studies.
