Table of Contents
Fetching ...

Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding

Haomiao Chen, Keith W Jamison, Mert R. Sabuncu, Amy Kuceyeski

TL;DR

The paper tackles the inefficiency and limited generalization of traditional voxel-grid neural encoders by introducing the Neural Response Function (NRF), a coordinate-based implicit representation that predicts fMRI responses as a continuous function over standardized MNI space: $\ hat = \Phi(M,\mathbf{x})$. NRF combines a multi-scale image feature extractor $G$ with a coordinate-conditioned MLP $P$, using Fourier-encoded coordinates to achieve resolution-agnostic predictions. The authors demonstrate data-efficient single-subject encoding and cross-subject transfer via fine-tuning and voxelwise ensemble, exploiting local anatomical smoothness and cross-subject alignment. Empirically, NRF outperforms baselines in low-data regimes, matches or exceeds them with full data, and supports flexible adaptation to new subjects with minimal data, offering a path toward an anatomically grounded, resolution-agnostic digital twin of the brain.

Abstract

Neural encoding models aim to predict fMRI-measured brain responses to natural images. fMRI data is acquired as a 3D volume of voxels, where each voxel has a defined spatial location in the brain. However, conventional encoding models often flatten this volume into a 1D vector and treat voxel responses as independent outputs. This removes spatial context, discards anatomical information, and ties each model to a subject-specific voxel grid. We introduce the Neural Response Function (NRF), a framework that models fMRI activity as a continuous function over anatomical space rather than a flat vector of voxels. NRF represents brain activity as a continuous implicit function: given an image and a spatial coordinate (x, y, z) in standardized MNI space, the model predicts the response at that location. This formulation decouples predictions from the training grid, supports querying at arbitrary spatial resolutions, and enables resolution-agnostic analyses. By grounding the model in anatomical space, NRF exploits two key properties of brain responses: (1) local smoothness -- neighboring voxels exhibit similar response patterns; modeling responses continuously captures these correlations and improves data efficiency, and (2) cross-subject alignment -- MNI coordinates unify data across individuals, allowing a model pretrained on one subject to be fine-tuned on new subjects. In experiments, NRF outperformed baseline models in both intrasubject encoding and cross-subject adaptation, achieving high performance while reducing the data size needed by orders of magnitude. To our knowledge, NRF is the first anatomically aware encoding model to move beyond flattened voxels, learning a continuous mapping from images to brain responses in 3D space.

Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding

TL;DR

The paper tackles the inefficiency and limited generalization of traditional voxel-grid neural encoders by introducing the Neural Response Function (NRF), a coordinate-based implicit representation that predicts fMRI responses as a continuous function over standardized MNI space: . NRF combines a multi-scale image feature extractor with a coordinate-conditioned MLP , using Fourier-encoded coordinates to achieve resolution-agnostic predictions. The authors demonstrate data-efficient single-subject encoding and cross-subject transfer via fine-tuning and voxelwise ensemble, exploiting local anatomical smoothness and cross-subject alignment. Empirically, NRF outperforms baselines in low-data regimes, matches or exceeds them with full data, and supports flexible adaptation to new subjects with minimal data, offering a path toward an anatomically grounded, resolution-agnostic digital twin of the brain.

Abstract

Neural encoding models aim to predict fMRI-measured brain responses to natural images. fMRI data is acquired as a 3D volume of voxels, where each voxel has a defined spatial location in the brain. However, conventional encoding models often flatten this volume into a 1D vector and treat voxel responses as independent outputs. This removes spatial context, discards anatomical information, and ties each model to a subject-specific voxel grid. We introduce the Neural Response Function (NRF), a framework that models fMRI activity as a continuous function over anatomical space rather than a flat vector of voxels. NRF represents brain activity as a continuous implicit function: given an image and a spatial coordinate (x, y, z) in standardized MNI space, the model predicts the response at that location. This formulation decouples predictions from the training grid, supports querying at arbitrary spatial resolutions, and enables resolution-agnostic analyses. By grounding the model in anatomical space, NRF exploits two key properties of brain responses: (1) local smoothness -- neighboring voxels exhibit similar response patterns; modeling responses continuously captures these correlations and improves data efficiency, and (2) cross-subject alignment -- MNI coordinates unify data across individuals, allowing a model pretrained on one subject to be fine-tuned on new subjects. In experiments, NRF outperformed baseline models in both intrasubject encoding and cross-subject adaptation, achieving high performance while reducing the data size needed by orders of magnitude. To our knowledge, NRF is the first anatomically aware encoding model to move beyond flattened voxels, learning a continuous mapping from images to brain responses in 3D space.

Paper Structure

This paper contains 37 sections, 2 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Overview of NRF.Top: Individual-subject NRF. Brain responses are modeled as a continuous function of both image features and anatomical coordinates in MNI space. NRF learns how to map to voxel responses while capturing correlations between neighboring voxels through anatomical coordinates. Bottom: New subject adaptation. 1) For novel subjects, the learned representation could be transferred by fine-tuning pretrained NRFs with limited data. 2) Predictions from multiple finetuned base models are combined via voxelwise ensembling to capture individual variability. NRF thus moves beyond grid-locked voxel models, offering a continuous, anatomically grounded representation that enables both data-efficient single-subject encoding and flexible cross-subject transfer.
  • Figure 2: Prediction accuracy (Pearson correlation) in low data regime. a. Single-subject models. NRF consistently outperforms baseline models when trained on limited samples from scratch, highlighting the benefit of its continuous mapping. Results are shown for the average median voxel correlation across four subjects, with error bars indicating the standard error of the mean (SEM). b. Cross-subject transfer. Voxel-level prediction accuracy visualized on the cortical surface of subject 7. When pretrained base models from other subjects are available, the NRF finetune ensemble further improves performance over NRF scratch and baselines, showing clear gains across visual regions.
  • Figure 3: Visualization comparison between different neural encoding models and NRF. GT = seen during data collection. Measured fMRI = decoded image using measured fMRI. Reconstructions from NRF-predicted responses preserve both low-level visual details and high-level semantic content of the stimuli. Results shown for Subject 1.
  • Figure 4: Probing anatomical awareness in NRF. (a) Disrupting spatial smoothness by shuffling coordinate–response pairings reduced accuracy, especially in low-data regimes, confirming that NRF relies on local continuity in brain responses. (b)(c) Breaking cross-subject alignment by shifting MNI coordinates degraded transfer, with the largest effect under limited data, showing that anatomical correspondence is critical for efficient adaptation.