M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training

Lakmal Meegahapola; Hamza Hassoune; Daniel Gatica-Perez

M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training

Lakmal Meegahapola, Hamza Hassoune, Daniel Gatica-Perez

TL;DR

M3BAT tackles distribution shift in multimodal mobile sensing by introducing a multi-branch adversarial UDA architecture that aligns source and target domains while preserving modality-specific information. The method leverages a staged training protocol with gradient reversal, branch-wise adaptation weights based on distribution-shift statistics, and a transition from a shared encoder to multiple modality-specific encoders. Empirical evaluation on WENET and WEEE across classification and regression tasks shows consistent gains over standard DANN and competitive performance relative to transfer learning, with up to about 12% AUC and 0.13 MAE improvements. The work highlights the importance of modality-aware adaptation, the impact of ground-truth label quality, and practical design choices for deploying robust multimodal sensing models in real-world settings.

Abstract

Over the years, multimodal mobile sensing has been used extensively for inferences regarding health and well being, behavior, and context. However, a significant challenge hindering the widespread deployment of such models in real world scenarios is the issue of distribution shift. This is the phenomenon where the distribution of data in the training set differs from the distribution of data in the real world, the deployment environment. While extensively explored in computer vision and natural language processing, and while prior research in mobile sensing briefly addresses this concern, current work primarily focuses on models dealing with a single modality of data, such as audio or accelerometer readings, and consequently, there is little research on unsupervised domain adaptation when dealing with multimodal sensor data. To address this gap, we did extensive experiments with domain adversarial neural networks (DANN) showing that they can effectively handle distribution shifts in multimodal sensor data. Moreover, we proposed a novel improvement over DANN, called M3BAT, unsupervised domain adaptation for multimodal mobile sensing with multi-branch adversarial training, to account for the multimodality of sensor data during domain adaptation with multiple branches. Through extensive experiments conducted on two multimodal mobile sensing datasets, three inference tasks, and 14 source-target domain pairs, including both regression and classification, we demonstrate that our approach performs effectively on unseen domains. Compared to directly deploying a model trained in the source domain to the target domain, the model shows performance increases up to 12% AUC (area under the receiver operating characteristics curves) on classification tasks, and up to 0.13 MAE (mean absolute error) on regression tasks.

M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training

TL;DR

Abstract

Paper Structure (41 sections, 1 equation, 10 figures, 5 tables)

This paper contains 41 sections, 1 equation, 10 figures, 5 tables.

Introduction
Background and Related Work
Distribution Shift
Unsupervised Domain Adaptation
Adapting Models to Individuals
Mobile Sensing for Inferences Regarding Health and Well-Being, Behavior, and Context
Domain Generalization in Mobile Sensing
Domain Adaptation in Mobile Sensing
Summary
M3BAT Architecture
Unsupervised Domain Adaptation with Domain Adversarial Training (DANN)
Multiple Branches to Process Multimodal Data
Training Process with Multimodal Domain Adversarial Training
Step 1: Train Common Encoder with Target Discriminator
Step 2: Introduce Unlabeled Target Domain Data
...and 26 more sections

Figures (10)

Figure 1: Base architecture for UDA with features from multimodal sensors, encoder, domain and target classifier/regressor, and gradient reversal layer.
Figure 2: Modification to the base architecture to have multiple branches that concatenate to create a feature embedding.
Figure 3: Using different $\lambda$ for branches depending on the average distribution shift of features in the branch. When there is little to no shift, $\lambda$$\approx$0 (green).
Figure 4: Average Cohen's-d values for modalities. Italy is the source domain.
Figure 5: Average Cohen's-d values for modalities. Mongolia is the source domain.
...and 5 more figures

M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training

TL;DR

Abstract

M3BAT: Unsupervised Domain Adaptation for Multimodal Mobile Sensing with Multi-Branch Adversarial Training

Authors

TL;DR

Abstract

Table of Contents

Figures (10)