Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex
Muquan Yu, Mu Nan, Hossein Adeli, Jacob S. Prince, John A. Pyles, Leila Wehbe, Margaret M. Henderson, Michael J. Tarr, Andrew F. Luo
TL;DR
The paper tackles the challenge of building generalizable, image-computable encoders for human higher visual cortex under substantial inter-subject variability and data constraints. It proposes BraInCoRL, a transformer-based meta-learning framework that performs in-context learning to infer voxelwise encoding functions from a few stimuli without any finetuning on new subjects, leveraging cross-subject data and context from multiple images. The approach demonstrates strong data efficiency and cross-dataset generalization (NSD and BOLD5000), reveals interpretable attention to category-relevant stimuli, and enables language-driven, zero-shot mappings to voxel selectivity. Overall, BraInCoRL provides a foundation model for fMRI encoders that supports rapid, subject-specific cortical mapping and has potential applications in clinical mapping and brain–computer interfaces.
Abstract
Understanding functional representations within higher visual cortex is a fundamental question in computational neuroscience. While artificial neural networks pretrained on large-scale datasets exhibit striking representational alignment with human neural responses, learning image-computable models of visual cortex relies on individual-level, large-scale fMRI datasets. The necessity for expensive, time-intensive, and often impractical data acquisition limits the generalizability of encoders to new subjects and stimuli. BraInCoRL uses in-context learning to predict voxelwise neural responses from few-shot examples without any additional finetuning for novel subjects and stimuli. We leverage a transformer architecture that can flexibly condition on a variable number of in-context image stimuli, learning an inductive bias over multiple subjects. During training, we explicitly optimize the model for in-context learning. By jointly conditioning on image features and voxel activations, our model learns to directly generate better performing voxelwise models of higher visual cortex. We demonstrate that BraInCoRL consistently outperforms existing voxelwise encoder designs in a low-data regime when evaluated on entirely novel images, while also exhibiting strong test-time scaling behavior. The model also generalizes to an entirely new visual fMRI dataset, which uses different subjects and fMRI data acquisition parameters. Further, BraInCoRL facilitates better interpretability of neural signals in higher visual cortex by attending to semantically relevant stimuli. Finally, we show that our framework enables interpretable mappings from natural language queries to voxel selectivity.
