Table of Contents
Fetching ...

Duala: Dual-Level Alignment of Subjects and Stimuli for Cross-Subject fMRI Decoding

Shumeng Li, Jintao Guo, Jian Zhang, Yulin Zhou, Luyang Cao, Yinghuan Shi

TL;DR

Duala, a dual-level alignment framework designed to achieve stimulus-level consistency and subject-level alignment in fMRI-based cross-subject visual decoding, achieves over 81.1% image-to-brain retrieval accuracy and consistently outperforms existing fine-tuning strategies in both retrieval and reconstruction.

Abstract

Cross-subject visual decoding aims to reconstruct visual experiences from brain activity across individuals, enabling more scalable and practical brain-computer interfaces. However, existing methods often suffer from degraded performance when adapting to new subjects with limited data, as they struggle to preserve both the semantic consistency of stimuli and the alignment of brain responses. To address these challenges, we propose Duala, a dual-level alignment framework designed to achieve stimulus-level consistency and subject-level alignment in fMRI-based cross-subject visual decoding. (1) At the stimulus level, Duala introduces a semantic alignment and relational consistency strategy that preserves intra-class similarity and inter-class separability, maintaining clear semantic boundaries during adaptation. (2) At the subject level, a distribution-based feature perturbation mechanism is developed to capture both global and subject-specific variations, enabling adaptation to individual neural representations without overfitting. Experiments on the Natural Scenes Dataset (NSD) demonstrate that Duala effectively improves alignment across subjects. Remarkably, even when fine-tuned with only about one hour of fMRI data, Duala achieves over 81.1% image-to-brain retrieval accuracy and consistently outperforms existing fine-tuning strategies in both retrieval and reconstruction. Our code is available at https://github.com/ShumengLI/Duala.

Duala: Dual-Level Alignment of Subjects and Stimuli for Cross-Subject fMRI Decoding

TL;DR

Duala, a dual-level alignment framework designed to achieve stimulus-level consistency and subject-level alignment in fMRI-based cross-subject visual decoding, achieves over 81.1% image-to-brain retrieval accuracy and consistently outperforms existing fine-tuning strategies in both retrieval and reconstruction.

Abstract

Cross-subject visual decoding aims to reconstruct visual experiences from brain activity across individuals, enabling more scalable and practical brain-computer interfaces. However, existing methods often suffer from degraded performance when adapting to new subjects with limited data, as they struggle to preserve both the semantic consistency of stimuli and the alignment of brain responses. To address these challenges, we propose Duala, a dual-level alignment framework designed to achieve stimulus-level consistency and subject-level alignment in fMRI-based cross-subject visual decoding. (1) At the stimulus level, Duala introduces a semantic alignment and relational consistency strategy that preserves intra-class similarity and inter-class separability, maintaining clear semantic boundaries during adaptation. (2) At the subject level, a distribution-based feature perturbation mechanism is developed to capture both global and subject-specific variations, enabling adaptation to individual neural representations without overfitting. Experiments on the Natural Scenes Dataset (NSD) demonstrate that Duala effectively improves alignment across subjects. Remarkably, even when fine-tuned with only about one hour of fMRI data, Duala achieves over 81.1% image-to-brain retrieval accuracy and consistently outperforms existing fine-tuning strategies in both retrieval and reconstruction. Our code is available at https://github.com/ShumengLI/Duala.
Paper Structure (16 sections, 9 equations, 8 figures, 4 tables)

This paper contains 16 sections, 9 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Effectiveness of fine-tuning on retrieval performance and t-SNE visualization. The pre-trained model achieves a high retrieval performance, indicating strong generalization to unseen stimuli. However, after fine-tuning on data from a new subject, the retrieval performance drops significantly. Additionally, the t-SNE visualization of the pre-trained model shows clear class boundaries for the subject’s stimuli, while the t-SNE of the fine-tuned model on the new subject reveals blurry class boundaries, indicating that the fine-tuning process does not preserve the semantic structure as effectively for the new subject.
  • Figure 2: Different subjects are presented with different visual stimuli, even when the stimuli belong to the same category. For example, the subjects are shown an image of a cat, but the actual photos they see differ.
  • Figure 3: Overview of our Duala framework. Stimulus-level Semantic Preservation maintains the semantic structure of visual representations, and Subject-level Distribution Perturbation enhances cross-subject adaptability.
  • Figure 4: Semantic Alignment Constraint. We sample an anchor and a positive fMRI response elicited by images from the same category (Dog), and a negative from a different category (Giraffe). It encourages fMRI representations of the same stimulus class to be more similar than those of different classes within a subject, preserving intra-class discriminability.
  • Figure 5: Relational Consistency across subjects. Although subjects view different images even within the same category (e.g., different birds / buses / clocks), the similarity structure among categories should remain consistent across subjects.
  • ...and 3 more figures