Table of Contents
Fetching ...

Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition

Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen, Hans Arno Jacobsen

TL;DR

A spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution and a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG.

Abstract

Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks.

Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition

TL;DR

A spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution and a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG.

Abstract

Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks.
Paper Structure (20 sections, 12 equations, 7 figures, 6 tables)

This paper contains 20 sections, 12 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The overview of the MRLMC framework. The MRLMC adopts the Siamese network architecture, composed of multimodal signals input, a spatio-temporal contrasting module and a semantic consistency module.
  • Figure 2: The input modes of multimodal signals in MRLMC, including single modal mode and multimodal mode.
  • Figure 3: The overview of multiscale spatio-temporal convolutional (MSC) network. The input raw data or augmented data undergoes a convolution layer to generate embedding. Then, the spatio-temporal representation is extracted by multiscale convolution.
  • Figure 4: The architecture of transformer unit in semantic consistency module.
  • Figure 5: The channel location of fNIRS and EEG. Among them, orange is 16 NIR emitters, blue is 16 NIR receivers, green is 53 fNIRS channels, and purple is 16 EEG channels.
  • ...and 2 more figures