Table of Contents
Fetching ...

ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities

Julie Mordacq, Leo Milecki, Maria Vakalopoulou, Steve Oudot, Vicky Kalogeiton

TL;DR

ADAPT tackles the problem of robust multimodal learning for physiological state detection when modalities may be missing. It introduces a two-stage approach: (1) anchoring all modalities to a strong anchor via contrastive learning to form a shared embedding space with linear scalability, and (2) a Masked Multimodal Transformer that fuses modalities using masked attention to handle missing data and model inter- and intra-modal correlations. The method is validated on StressID (stress triggers) and LOC (G-forces–induced consciousness changes), achieving state-of-the-art performance and demonstrating robustness to missing modalities through extensive ablations. The work offers a practical framework for real-world medical and safety applications by enabling reliable multimodal inference even when some sensors are unavailable.

Abstract

Multimodality has recently gained attention in the medical domain, where imaging or video modalities may be integrated with biomedical signals or health records. Yet, two challenges remain: balancing the contributions of modalities, especially in cases with a limited amount of data available, and tackling missing modalities. To address both issues, in this paper, we introduce the AnchoreD multimodAl Physiological Transformer (ADAPT), a multimodal, scalable framework with two key components: (i) aligning all modalities in the space of the strongest, richest modality (called anchor) to learn a joint embedding space, and (ii) a Masked Multimodal Transformer, leveraging both inter- and intra-modality correlations while handling missing modalities. We focus on detecting physiological changes in two real-life scenarios: stress in individuals induced by specific triggers and fighter pilots' loss of consciousness induced by $g$-forces. We validate the generalizability of ADAPT through extensive experiments on two datasets for these tasks, where we set the new state of the art while demonstrating its robustness across various modality scenarios and its high potential for real-life applications.

ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities

TL;DR

ADAPT tackles the problem of robust multimodal learning for physiological state detection when modalities may be missing. It introduces a two-stage approach: (1) anchoring all modalities to a strong anchor via contrastive learning to form a shared embedding space with linear scalability, and (2) a Masked Multimodal Transformer that fuses modalities using masked attention to handle missing data and model inter- and intra-modal correlations. The method is validated on StressID (stress triggers) and LOC (G-forces–induced consciousness changes), achieving state-of-the-art performance and demonstrating robustness to missing modalities through extensive ablations. The work offers a practical framework for real-world medical and safety applications by enabling reliable multimodal inference even when some sensors are unavailable.

Abstract

Multimodality has recently gained attention in the medical domain, where imaging or video modalities may be integrated with biomedical signals or health records. Yet, two challenges remain: balancing the contributions of modalities, especially in cases with a limited amount of data available, and tackling missing modalities. To address both issues, in this paper, we introduce the AnchoreD multimodAl Physiological Transformer (ADAPT), a multimodal, scalable framework with two key components: (i) aligning all modalities in the space of the strongest, richest modality (called anchor) to learn a joint embedding space, and (ii) a Masked Multimodal Transformer, leveraging both inter- and intra-modality correlations while handling missing modalities. We focus on detecting physiological changes in two real-life scenarios: stress in individuals induced by specific triggers and fighter pilots' loss of consciousness induced by -forces. We validate the generalizability of ADAPT through extensive experiments on two datasets for these tasks, where we set the new state of the art while demonstrating its robustness across various modality scenarios and its high potential for real-life applications.
Paper Structure (29 sections, 2 equations, 4 figures, 8 tables)

This paper contains 29 sections, 2 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Overview of ADAPT. In each minibatch, ADAPT takes up to $M$ modalities, including video, audio, and biosignals, as input to produce a modality-agnostic representation for downstream tasks. It is trained in two steps. (i) Anchoring. We align the representations of all modalities via contrastive learning to the one of an anchor modality, i.e., the strongest and richest modality; here the video. (ii) Fusion. The encoders' features are concatenated and fed into the Masked Multimodal Transformer. When a modality is unavailable, the transformer masks its corresponding feature representations. The final representation (i.e., [CLS] token output) is used for downstream tasks.
  • Figure 2: TPR vs TNR for LOC.†Methods from chaptoukaev2023stressid
  • Figure 3: Example of a Loss of consciousness launch.
  • Figure 4: Example of a No Loss of consciousness launch.