CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

Siyuan Kan; Huanyu Wu; Zhenyao Cui; Fan Huang; Xiaolong Xu; Dongrui Wu

CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

Siyuan Kan, Huanyu Wu, Zhenyao Cui, Fan Huang, Xiaolong Xu, Dongrui Wu

TL;DR

CMCRD addresses the challenge of improving emotion recognition when only a single modality is available at test by distilling knowledge from an EM teacher into an EEG student. It introduces a cross-modal contrastive representation distillation framework that uses minimum class confusion for teacher training and a mutual-information–based CMCRD loss for the student, enhanced by sampling weights derived from teacher prediction entropy. Evaluated on SEED, SEED-IV, and SEED-V with three backbone networks, CMCRD achieves notable gains over EEG-only baselines (average around $6.2\%$) and outperforms other distillation methods, while reducing hardware and data collection requirements at deployment. The results validate cross-modal distillation’s practical potential and suggest extensions to regression tasks, semi-supervised settings, and domain adaptation.

Abstract

Emotion recognition is an important component of affective computing, and also human-machine interaction. Unimodal emotion recognition is convenient, but the accuracy may not be high enough; on the contrary, multi-modal emotion recognition may be more accurate, but it also increases the complexity and cost of the data collection system. This paper considers cross-modal emotion recognition, i.e., using both electroencephalography (EEG) and eye movement in training, but only EEG or eye movement in test. We propose cross-modal contrastive representation distillation (CMCRD), which uses a pre-trained eye movement classification model to assist the training of an EEG classification model, improving feature extraction from EEG signals, or vice versa. During test, only EEG signals (or eye movement signals) are acquired, eliminating the need for multi-modal data. CMCRD not only improves the emotion recognition accuracy, but also makes the system more simplified and practical. Experiments using three different neural network architectures on three multi-modal emotion recognition datasets demonstrated the effectiveness of CMCRD. Compared with the EEG-only model, it improved the average classification accuracy by about 6.2%.

CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

TL;DR

Abstract

CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)