Table of Contents
Fetching ...

MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT

Xiaomin Ouyang, Jason Wu, Tomoyoshi Kimura, Yihan Lin, Gunjan Verma, Tarek Abdelzaher, Mani Srivastava

TL;DR

MMBind tackles multimodal learning with distributed and heterogeneous IoT data by binding incomplete samples through a shared modality to create pseudo-paired data, followed by weighted contrastive learning in an adaptive multimodal architecture. The two-stage process—pairing incomplete data via a shared modality and training with heterogeneous modality combinations—enables robust multimodal embeddings even with limited naturally paired data and under domain shift. Across ten real-world datasets, MMBind consistently outperforms baselines and demonstrates practical feasibility for edge deployment and potential for IoT multimodal foundation model training. The work highlights the importance of shared-modality choice, data pairing quality, and adaptive training strategies in leveraging fragmented IoT data for scalable multimodal learning.

Abstract

Multimodal sensing systems are increasingly prevalent in various real-world applications. Most existing multimodal learning approaches heavily rely on training with a large amount of synchronized, complete multimodal data. However, such a setting is impractical in real-world IoT sensing applications where data is typically collected by distributed nodes with heterogeneous data modalities, and is also rarely labeled. In this paper, we propose MMBind, a new data binding approach for multimodal learning on distributed and heterogeneous IoT data. The key idea of MMBind is to construct a pseudo-paired multimodal dataset for model training by binding data from disparate sources and incomplete modalities through a sufficiently descriptive shared modality. We also propose a weighted contrastive learning approach to handle domain shifts among disparate data, coupled with an adaptive multimodal learning architecture capable of training models with heterogeneous modality combinations. Evaluations on ten real-world multimodal datasets highlight that MMBind outperforms state-of-the-art baselines under varying degrees of data incompleteness and domain shift, and holds promise for advancing multimodal foundation model training in IoT applications\footnote (The source code is available via https://github.com/nesl/multimodal-bind).

MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT

TL;DR

MMBind tackles multimodal learning with distributed and heterogeneous IoT data by binding incomplete samples through a shared modality to create pseudo-paired data, followed by weighted contrastive learning in an adaptive multimodal architecture. The two-stage process—pairing incomplete data via a shared modality and training with heterogeneous modality combinations—enables robust multimodal embeddings even with limited naturally paired data and under domain shift. Across ten real-world datasets, MMBind consistently outperforms baselines and demonstrates practical feasibility for edge deployment and potential for IoT multimodal foundation model training. The work highlights the importance of shared-modality choice, data pairing quality, and adaptive training strategies in leveraging fragmented IoT data for scalable multimodal learning.

Abstract

Multimodal sensing systems are increasingly prevalent in various real-world applications. Most existing multimodal learning approaches heavily rely on training with a large amount of synchronized, complete multimodal data. However, such a setting is impractical in real-world IoT sensing applications where data is typically collected by distributed nodes with heterogeneous data modalities, and is also rarely labeled. In this paper, we propose MMBind, a new data binding approach for multimodal learning on distributed and heterogeneous IoT data. The key idea of MMBind is to construct a pseudo-paired multimodal dataset for model training by binding data from disparate sources and incomplete modalities through a sufficiently descriptive shared modality. We also propose a weighted contrastive learning approach to handle domain shifts among disparate data, coupled with an adaptive multimodal learning architecture capable of training models with heterogeneous modality combinations. Evaluations on ten real-world multimodal datasets highlight that MMBind outperforms state-of-the-art baselines under varying degrees of data incompleteness and domain shift, and holds promise for advancing multimodal foundation model training in IoT applications\footnote (The source code is available via https://github.com/nesl/multimodal-bind).

Paper Structure

This paper contains 43 sections, 8 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: MMBind features a data binding approach that effectively integrates disparate data with heterogeneous modalities for multimodal learning, which outperforms the model binding approach, where various modalities are indirectly aligned through the encoder of a shared modality.
  • Figure 2: Impact of limited complete multimodal data.
  • Figure 3: Adding pseudo-paired data enhances multimodal performance.
  • Figure 4: MMBind binds heterogeneous multimodal data from distributed nodes using shared sensor modalities or labels to effectively train a full-modality model.
  • Figure 5: MMBind consists of two stages to bind distributed and heterogeneous IoT data for multimodal training, i.e., pairing incomplete data with shared modalities and weighted contrastive learning with heterogeneous data.
  • ...and 10 more figures