Table of Contents
Fetching ...

M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training

Lakmal Meegahapola, Hamza Hassoune, Daniel Gatica-Perez

TL;DR

M3BAT tackles distribution shift in multimodal mobile sensing by introducing a multi-branch adversarial UDA architecture that aligns source and target domains while preserving modality-specific information. The method leverages a staged training protocol with gradient reversal, branch-wise adaptation weights based on distribution-shift statistics, and a transition from a shared encoder to multiple modality-specific encoders. Empirical evaluation on WENET and WEEE across classification and regression tasks shows consistent gains over standard DANN and competitive performance relative to transfer learning, with up to about 12% AUC and 0.13 MAE improvements. The work highlights the importance of modality-aware adaptation, the impact of ground-truth label quality, and practical design choices for deploying robust multimodal sensing models in real-world settings.

Abstract

Over the years, multimodal mobile sensing has been used extensively for inferences regarding health and well being, behavior, and context. However, a significant challenge hindering the widespread deployment of such models in real world scenarios is the issue of distribution shift. This is the phenomenon where the distribution of data in the training set differs from the distribution of data in the real world, the deployment environment. While extensively explored in computer vision and natural language processing, and while prior research in mobile sensing briefly addresses this concern, current work primarily focuses on models dealing with a single modality of data, such as audio or accelerometer readings, and consequently, there is little research on unsupervised domain adaptation when dealing with multimodal sensor data. To address this gap, we did extensive experiments with domain adversarial neural networks (DANN) showing that they can effectively handle distribution shifts in multimodal sensor data. Moreover, we proposed a novel improvement over DANN, called M3BAT, unsupervised domain adaptation for multimodal mobile sensing with multi-branch adversarial training, to account for the multimodality of sensor data during domain adaptation with multiple branches. Through extensive experiments conducted on two multimodal mobile sensing datasets, three inference tasks, and 14 source-target domain pairs, including both regression and classification, we demonstrate that our approach performs effectively on unseen domains. Compared to directly deploying a model trained in the source domain to the target domain, the model shows performance increases up to 12% AUC (area under the receiver operating characteristics curves) on classification tasks, and up to 0.13 MAE (mean absolute error) on regression tasks.

M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training

TL;DR

M3BAT tackles distribution shift in multimodal mobile sensing by introducing a multi-branch adversarial UDA architecture that aligns source and target domains while preserving modality-specific information. The method leverages a staged training protocol with gradient reversal, branch-wise adaptation weights based on distribution-shift statistics, and a transition from a shared encoder to multiple modality-specific encoders. Empirical evaluation on WENET and WEEE across classification and regression tasks shows consistent gains over standard DANN and competitive performance relative to transfer learning, with up to about 12% AUC and 0.13 MAE improvements. The work highlights the importance of modality-aware adaptation, the impact of ground-truth label quality, and practical design choices for deploying robust multimodal sensing models in real-world settings.

Abstract

Over the years, multimodal mobile sensing has been used extensively for inferences regarding health and well being, behavior, and context. However, a significant challenge hindering the widespread deployment of such models in real world scenarios is the issue of distribution shift. This is the phenomenon where the distribution of data in the training set differs from the distribution of data in the real world, the deployment environment. While extensively explored in computer vision and natural language processing, and while prior research in mobile sensing briefly addresses this concern, current work primarily focuses on models dealing with a single modality of data, such as audio or accelerometer readings, and consequently, there is little research on unsupervised domain adaptation when dealing with multimodal sensor data. To address this gap, we did extensive experiments with domain adversarial neural networks (DANN) showing that they can effectively handle distribution shifts in multimodal sensor data. Moreover, we proposed a novel improvement over DANN, called M3BAT, unsupervised domain adaptation for multimodal mobile sensing with multi-branch adversarial training, to account for the multimodality of sensor data during domain adaptation with multiple branches. Through extensive experiments conducted on two multimodal mobile sensing datasets, three inference tasks, and 14 source-target domain pairs, including both regression and classification, we demonstrate that our approach performs effectively on unseen domains. Compared to directly deploying a model trained in the source domain to the target domain, the model shows performance increases up to 12% AUC (area under the receiver operating characteristics curves) on classification tasks, and up to 0.13 MAE (mean absolute error) on regression tasks.
Paper Structure (41 sections, 1 equation, 10 figures, 5 tables)

This paper contains 41 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Base architecture for UDA with features from multimodal sensors, encoder, domain and target classifier/regressor, and gradient reversal layer.
  • Figure 2: Modification to the base architecture to have multiple branches that concatenate to create a feature embedding.
  • Figure 3: Using different $\lambda$ for branches depending on the average distribution shift of features in the branch. When there is little to no shift, $\lambda$$\approx$0 (green).
  • Figure 4: Average Cohen's-d values for modalities. Italy is the source domain.
  • Figure 5: Average Cohen's-d values for modalities. Mongolia is the source domain.
  • ...and 5 more figures