Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

Zixing Li; Chao Yan; Zhen Lan; Xiaojia Xiang; Han Zhou; Jun Lai; Dengqing Tang

Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

Zixing Li, Chao Yan, Zhen Lan, Xiaojia Xiang, Han Zhou, Jun Lai, Dengqing Tang

TL;DR

The paper tackles dim object detection in aerial imagery under few-shot conditions by coupling brain-derived ERP signals with computer-vision features. It introduces a brain-eye-computer based system and a novel Adaptive Modality Balanced Online Knowledge Distillation (AMBOKD) framework that enables end-to-end mutual learning across EEG and image modalities via a multi-head attention fusion module and dynamic balancing of learning signals. Key contributions include the AMBOKD method, the ESSVP dataset (first public EEG-visual paired dataset for aerial-like targets), and extensive experiments across ESSVP, CIFAR-100, SEED-VIG, plus real-world system validation demonstrating superior performance and robustness. The work advances multimodal fusion and online knowledge transfer in scarce-data regimes, with practical implications for robust dim-target detection in UAV-based applications and beyond.

Abstract

Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method.

Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

TL;DR

Abstract

Paper Structure (34 sections, 14 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 14 equations, 11 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Brain-Computer Based Object Recognition
Multimodal Learning
Knowledge Distillation
System Design and Data Collection
System Design
Suspicious Region Detection
ESSVP Paradigm
Data Collection and Preprocessing
Methodology
Extraction of Modality Representation
Visual Encoder
EEG Encoder
Fusion of Modality Representation
...and 19 more sections

Figures (11)

Figure 1: Brain-eye-computer based object detection system with our proposed AMBOKD method.
Figure 2: The data acquisition process of the brain-eye-computer based object detection system. First, the captured image data is pre-processed by the feature encoder and region proposal network to obtain the image with region proposal. Then, the ESSVP paradigm is constructed to elicit the ERPs in EEG signals. After that, the EEG cap collects the EEG signals and sends them to the computer through the signal amplifier. At last, the EEG and image data are combined as the paired data with the help of the eye tracking data and time-stamp synchronization signal from the eye tracker and the synchronization box.
Figure 3: Examples of stimulus materials.
Figure 4:
Figure 5: The noise effect on image data of validation set.
...and 6 more figures

Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

TL;DR

Abstract

Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (11)