Table of Contents
Fetching ...

ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding

Haonan Wang, Jingyu Lu, Hongrui Li, Xiaomeng Li

TL;DR

Zebra tackles zero-shot cross-subject brain decoding by disentangling fMRI representations into subject-invariant and semantic-specific components using residual decomposition and adversarial training. The method couples a ViT-based brain encoder with a diffusion prior and introduces Subject-Invariant Feature Extraction (SIFE) and Semantic-Specific Feature Extraction (SSFE) to learn universal semantics while suppressing subject-specific noise. Through adversarial objectives and preservation anchors, Zebra achieves zero-shot generalization to unseen subjects and attains performance close to finetuned models on multiple metrics, demonstrating scalable, real-world potential for brain decoding. The approach advances practical neural decoding by reducing the need for subject-specific data, enabling faster, more accessible brain-computer interface applications, while outlining future work on semantic fidelity and broader applicability.

Abstract

Recent advances in neural decoding have enabled the reconstruction of visual experiences from brain activity, positioning fMRI-to-image reconstruction as a promising bridge between neuroscience and computer vision. However, current methods predominantly rely on subject-specific models or require subject-specific fine-tuning, limiting their scalability and real-world applicability. In this work, we introduce ZEBRA, the first zero-shot brain visual decoding framework that eliminates the need for subject-specific adaptation. ZEBRA is built on the key insight that fMRI representations can be decomposed into subject-related and semantic-related components. By leveraging adversarial training, our method explicitly disentangles these components to isolate subject-invariant, semantic-specific representations. This disentanglement allows ZEBRA to generalize to unseen subjects without any additional fMRI data or retraining. Extensive experiments show that ZEBRA significantly outperforms zero-shot baselines and achieves performance comparable to fully finetuned models on several metrics. Our work represents a scalable and practical step toward universal neural decoding. Code and model weights are available at: https://github.com/xmed-lab/ZEBRA.

ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding

TL;DR

Zebra tackles zero-shot cross-subject brain decoding by disentangling fMRI representations into subject-invariant and semantic-specific components using residual decomposition and adversarial training. The method couples a ViT-based brain encoder with a diffusion prior and introduces Subject-Invariant Feature Extraction (SIFE) and Semantic-Specific Feature Extraction (SSFE) to learn universal semantics while suppressing subject-specific noise. Through adversarial objectives and preservation anchors, Zebra achieves zero-shot generalization to unseen subjects and attains performance close to finetuned models on multiple metrics, demonstrating scalable, real-world potential for brain decoding. The approach advances practical neural decoding by reducing the need for subject-specific data, enabling faster, more accessible brain-computer interface applications, while outlining future work on semantic fidelity and broader applicability.

Abstract

Recent advances in neural decoding have enabled the reconstruction of visual experiences from brain activity, positioning fMRI-to-image reconstruction as a promising bridge between neuroscience and computer vision. However, current methods predominantly rely on subject-specific models or require subject-specific fine-tuning, limiting their scalability and real-world applicability. In this work, we introduce ZEBRA, the first zero-shot brain visual decoding framework that eliminates the need for subject-specific adaptation. ZEBRA is built on the key insight that fMRI representations can be decomposed into subject-related and semantic-related components. By leveraging adversarial training, our method explicitly disentangles these components to isolate subject-invariant, semantic-specific representations. This disentanglement allows ZEBRA to generalize to unseen subjects without any additional fMRI data or retraining. Extensive experiments show that ZEBRA significantly outperforms zero-shot baselines and achieves performance comparable to fully finetuned models on several metrics. Our work represents a scalable and practical step toward universal neural decoding. Code and model weights are available at: https://github.com/xmed-lab/ZEBRA.

Paper Structure

This paper contains 13 sections, 5 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (a) Previous methods mindeye2huo2024neuropictorgong2025mindtuner typically involve two training stages: (1) pretraining a brain model with multiple subjects, and (2) fine-tuning the model for a specific subject. In this approach, the test subject is known to the model, which limits its zero-shot capability for new subjects. (b) In contrast, Zebra eliminates the fine-tuning stage, requiring training only once with the training subjects. This allows it to perform zero-shot inference on unseen subjects, achieving comparable performance to the fine-tuned approaches.
  • Figure 2: Core idea of Zebra. $\bm{F}_s$ is used as diffusion prior guidance.
  • Figure 3: Zebra consists of two key components: (1) Subject-Invariant Feature Extraction, which disentangles subject-invariant representations from brain activity using adversarial learning and residual decomposition (§\ref{['sec:SIFE']}); and (2) Semantic-Specific Feature Extraction, which aligns semantic information in brain features with vision-language embeddings via supervised learning and gradient reversal (§\ref{['sec:SSFE']}). During inference, only the invariant projection path is used, enabling zero-shot generalization to unseen subjects.
  • Figure 4: Representation Preservation Anchor of SSFE.
  • Figure 5: Qualitative comparison between Zebra and zero-shot implementation of NeuroPictor and Mindeye2 (1h).
  • ...and 2 more figures