Table of Contents
Fetching ...

The Wisdom of a Crowd of Brains: A Universal Brain Encoder

Roman Beliy, Navve Wasserman, Amit Zalcher, Michal Irani

TL;DR

The paper tackles the bottleneck of image-to-fMRI encoding being largely subject- and dataset-specific by introducing a Universal Brain-Encoder with voxel-centric voxel embeddings. It jointly trains on diverse subjects/datasets, using a shared image-feature extractor and a cross-attention mechanism that links voxel functionality to multi-scale image features, while learning a 256-dimensional embedding per voxel. This crowd-based approach yields improved encoding performance, enables efficient transfer-learning to new subjects with minimal data, and reveals functionally meaningful voxel clusters that map to shared brain functions without requiring exact anatomical alignment. The work demonstrates practical impact by boosting encoding accuracy across datasets and providing a scalable framework for exploring brain functionality at voxel granularity. Overall, the method significantly extends the utility and applicability of brain-encoding models for neuroscience and potential clinical use.

Abstract

Image-to-fMRI encoding is important for both neuroscience research and practical applications. However, such "Brain-Encoders" have been typically trained per-subject and per fMRI-dataset, thus restricted to very limited training data. In this paper we propose a Universal Brain-Encoder, which can be trained jointly on data from many different subjects/datasets/machines. What makes this possible is our new voxel-centric Encoder architecture, which learns a unique "voxel-embedding" per brain-voxel. Our Encoder trains to predict the response of each brain-voxel on every image, by directly computing the cross-attention between the brain-voxel embedding and multi-level deep image features. This voxel-centric architecture allows the functional role of each brain-voxel to naturally emerge from the voxel-image cross-attention. We show the power of this approach to (i) combine data from multiple different subjects (a "Crowd of Brains") to improve each individual brain-encoding, (ii) quick & effective Transfer-Learning across subjects, datasets, and machines (e.g., 3-Tesla, 7-Tesla), with few training examples, and (iii) use the learned voxel-embeddings as a powerful tool to explore brain functionality (e.g., what is encoded where in the brain).

The Wisdom of a Crowd of Brains: A Universal Brain Encoder

TL;DR

The paper tackles the bottleneck of image-to-fMRI encoding being largely subject- and dataset-specific by introducing a Universal Brain-Encoder with voxel-centric voxel embeddings. It jointly trains on diverse subjects/datasets, using a shared image-feature extractor and a cross-attention mechanism that links voxel functionality to multi-scale image features, while learning a 256-dimensional embedding per voxel. This crowd-based approach yields improved encoding performance, enables efficient transfer-learning to new subjects with minimal data, and reveals functionally meaningful voxel clusters that map to shared brain functions without requiring exact anatomical alignment. The work demonstrates practical impact by boosting encoding accuracy across datasets and providing a scalable framework for exploring brain functionality at voxel granularity. Overall, the method significantly extends the utility and applicability of brain-encoding models for neuroscience and potential clinical use.

Abstract

Image-to-fMRI encoding is important for both neuroscience research and practical applications. However, such "Brain-Encoders" have been typically trained per-subject and per fMRI-dataset, thus restricted to very limited training data. In this paper we propose a Universal Brain-Encoder, which can be trained jointly on data from many different subjects/datasets/machines. What makes this possible is our new voxel-centric Encoder architecture, which learns a unique "voxel-embedding" per brain-voxel. Our Encoder trains to predict the response of each brain-voxel on every image, by directly computing the cross-attention between the brain-voxel embedding and multi-level deep image features. This voxel-centric architecture allows the functional role of each brain-voxel to naturally emerge from the voxel-image cross-attention. We show the power of this approach to (i) combine data from multiple different subjects (a "Crowd of Brains") to improve each individual brain-encoding, (ii) quick & effective Transfer-Learning across subjects, datasets, and machines (e.g., 3-Tesla, 7-Tesla), with few training examples, and (iii) use the learned voxel-embeddings as a powerful tool to explore brain functionality (e.g., what is encoded where in the brain).
Paper Structure (30 sections, 18 figures, 5 tables)

This paper contains 30 sections, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Overview. The Universal Image-to-fMRI Brain-Encoder trains jointly on multiple subjects & datasets. It learns to predict fMRI activation of each brain-voxel on any image, via cross-attention between learned brain-voxel embeddings and deep image features.
  • Figure 2: Universal-Encoder Architecture.Input: an image & a brain-voxel index (a pointer to its Voxel-Embedding vector); Output: The predicted fMRI activation of this brain-voxel on that image. The model has 3 main components: (a) Feature Extraction Block -- extracts multi-scale (DINO-adapted) image features; (b) Learned Voxel-Embedding -- captures the unique functionality of each voxel; (c) Cross-Attention Block -- establishes the connection between voxel-functionality and relevant image features
  • Figure 3: Qualitative Evaluation of the Universal-Encoder. (a) Visual comparison of Real vs. Encoder-predicted fMRIs for 3 test images. (b) Top 5 retrieved images for each "Query" test-fMRI. (See text for more details)
  • Figure 4: The Wisdom of a Crowd of Brains. By aggregating data from multiple subjects, our Universal-Encoder improves encoding of any individual subject. We compared 3 models: (i) "Baseline" single-subject encoder of gaziv2022self, (ii)~"Universal Encoder - single subject"~-- our architecture trained on each subject separately, (iii)~"Universal Encoder - multiple subjects"~-- our model trained on data from 8 subjects. (a)~Pearson Correlation (per voxel) between predicted & ground-truth fMRI (Median value, 75th & 25th percentiles). (b) Retrieval Accuracy (Top-1 & Top-5) of the GT image per "Query" fMRI.
  • Figure 5: The Wisdom of the Crowd of Datasets: Using data from a high-quality 7T dataset (NSD) significantly enhances the encoding performance in lower-quality (3T & 4T) datasets.
  • ...and 13 more figures