Table of Contents
Fetching ...

Shrinking the Teacher: An Adaptive Teaching Paradigm for Asymmetric EEG-Vision Alignment

Lukun Wu, Jie Li, Ziqi Ren, Kaifan Zhang, Xinbo Gao

TL;DR

This work reframes vision-to-EEG alignment as an inherently asymmetric problem, decomposing the gap into Fidelity and Semantic components. It introduces the Adaptive Teaching System (ATS), where a pretrained visual encoder (teacher) is dynamically shrunk via the ShrinkAdapter to better match the EEG (student) capacity, and is complemented by the Shared Temporal Attention Encoder (STAE) to denoise temporal EEG signals. Through extensive zero-shot brain-to-image retrieval experiments on THINGS-EEG/THINGS-MEG, ATS achieves state-of-the-art performance (e.g., Top-1 60.2% on THINGS-EEG, +9.8% over previous SOTA) and demonstrates the importance of allowing the teacher to adapt while focusing on informative temporal segments. The results underscore a new paradigm for asymmetric cross-modal alignment with potential implications for neuroimaging-based retrieval and brain-computer interfaces.

Abstract

Decoding visual features from EEG signals is a central challenge in neuroscience, with cross-modal alignment as the dominant approach. We argue that the relationship between visual and brain modalities is fundamentally asymmetric, characterized by two critical gaps: a Fidelity Gap (stemming from EEG's inherent noise and signal degradation, vs. vision's high-fidelity features) and a Semantic Gap (arising from EEG's shallow conceptual representation, vs. vision's rich semantic depth). Previous methods often overlook this asymmetry, forcing alignment between the two modalities as if they were equal partners and thereby leading to poor generalization. To address this, we propose the adaptive teaching paradigm. This paradigm empowers the ``teacher" modality (vision) to dynamically shrink and adjust its knowledge structure under task guidance, tailoring its semantically dense features to match the ``student" modality (EEG)'s capacity. We implement this paradigm with the ShrinkAdapter, a simple yet effective module featuring a residual-free design and a bottleneck structure. Through extensive experiments, we validate the underlying rationale and effectiveness of our paradigm. Our method achieves a top-1 accuracy of 60.2\% on the zero-shot brain-to-image retrieval task, surpassing previous state-of-the-art methods by a margin of 9.8\%. Our work introduces a new perspective for asymmetric alignment: the teacher must shrink and adapt to bridge the vision-brain gap.

Shrinking the Teacher: An Adaptive Teaching Paradigm for Asymmetric EEG-Vision Alignment

TL;DR

This work reframes vision-to-EEG alignment as an inherently asymmetric problem, decomposing the gap into Fidelity and Semantic components. It introduces the Adaptive Teaching System (ATS), where a pretrained visual encoder (teacher) is dynamically shrunk via the ShrinkAdapter to better match the EEG (student) capacity, and is complemented by the Shared Temporal Attention Encoder (STAE) to denoise temporal EEG signals. Through extensive zero-shot brain-to-image retrieval experiments on THINGS-EEG/THINGS-MEG, ATS achieves state-of-the-art performance (e.g., Top-1 60.2% on THINGS-EEG, +9.8% over previous SOTA) and demonstrates the importance of allowing the teacher to adapt while focusing on informative temporal segments. The results underscore a new paradigm for asymmetric cross-modal alignment with potential implications for neuroimaging-based retrieval and brain-computer interfaces.

Abstract

Decoding visual features from EEG signals is a central challenge in neuroscience, with cross-modal alignment as the dominant approach. We argue that the relationship between visual and brain modalities is fundamentally asymmetric, characterized by two critical gaps: a Fidelity Gap (stemming from EEG's inherent noise and signal degradation, vs. vision's high-fidelity features) and a Semantic Gap (arising from EEG's shallow conceptual representation, vs. vision's rich semantic depth). Previous methods often overlook this asymmetry, forcing alignment between the two modalities as if they were equal partners and thereby leading to poor generalization. To address this, we propose the adaptive teaching paradigm. This paradigm empowers the ``teacher" modality (vision) to dynamically shrink and adjust its knowledge structure under task guidance, tailoring its semantically dense features to match the ``student" modality (EEG)'s capacity. We implement this paradigm with the ShrinkAdapter, a simple yet effective module featuring a residual-free design and a bottleneck structure. Through extensive experiments, we validate the underlying rationale and effectiveness of our paradigm. Our method achieves a top-1 accuracy of 60.2\% on the zero-shot brain-to-image retrieval task, surpassing previous state-of-the-art methods by a margin of 9.8\%. Our work introduces a new perspective for asymmetric alignment: the teacher must shrink and adapt to bridge the vision-brain gap.

Paper Structure

This paper contains 49 sections, 7 equations, 12 figures, 34 tables, 1 algorithm.

Figures (12)

  • Figure 1: From Forced Alignment to Adaptive Teaching: A paradigm shift for asymmetric modality alignment.
  • Figure 1: The 17 selected EEG channels from the occipital and parietal lobes used in our study, highlighted in orange. The layout follows the standard 10-10 system.
  • Figure 2: The physiological basis for our motivation: Deconstructing the asymmetric modality gap between vision and EEG into Fidelity Gap and Semantic Gap.
  • Figure 2: Architecture of the Shared Temporal Attention Encoder (STAE).
  • Figure 3: Overview of the Adaptive Teaching System for zero-shot EEG-to-Image retrieval. Training: A ShrinkAdapter enables the visual "teacher" to adapt its features for alignment with the EEG "student" via InfoNCE loss. Testing: The trained student encoder performs zero-shot image retrieval from a candidate set.
  • ...and 7 more figures