Table of Contents
Fetching ...

Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms

Yuanzhe Peng, Jieming Bian, Lei Wang, Yin Huang, Jie Xu

TL;DR

Multimodal Federated Learning (MFL) sits at the intersection of multimodal data fusion and distributed privacy-preserving training. This work proposes a paradigm-aware taxonomy that organizes MFL research along horizontal, vertical, and hybrid FL, detailing problem formulations, training algorithms, and modality-specific challenges for each paradigm. It surveys representative methods across HFL, VFL, and Hybrid FL, discusses applications and public datasets, and outlines open challenges such as modality heterogeneity, privacy leakage, efficiency, and personalization. By anchoring MFL to FL paradigms, the paper clarifies the trade-offs between modality diversity, privacy guarantees, and system constraints, and points to future directions including cross-modal knowledge transfer and interpretable, personalized MFL systems.

Abstract

Multimodal Federated Learning (MFL) lies at the intersection of two pivotal research areas: leveraging complementary information from multiple modalities to improve downstream inference performance and enabling distributed training to enhance efficiency and preserve privacy. Despite the growing interest in MFL, there is currently no comprehensive taxonomy that organizes MFL through the lens of different Federated Learning (FL) paradigms. This perspective is important because multimodal data introduces distinct challenges across various FL settings. These challenges, including modality heterogeneity, privacy heterogeneity, and communication inefficiency, are fundamentally different from those encountered in traditional unimodal or non-FL scenarios. In this paper, we systematically examine MFL within the context of three major FL paradigms: horizontal FL (HFL), vertical FL (VFL), and hybrid FL. For each paradigm, we present the problem formulation, review representative training algorithms, and highlight the most prominent challenge introduced by multimodal data in distributed settings. We also discuss open challenges and provide insights for future research. By establishing this taxonomy, we aim to uncover the novel challenges posed by multimodal data from the perspective of different FL paradigms and to offer a new lens through which to understand and advance the development of MFL.

Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms

TL;DR

Multimodal Federated Learning (MFL) sits at the intersection of multimodal data fusion and distributed privacy-preserving training. This work proposes a paradigm-aware taxonomy that organizes MFL research along horizontal, vertical, and hybrid FL, detailing problem formulations, training algorithms, and modality-specific challenges for each paradigm. It surveys representative methods across HFL, VFL, and Hybrid FL, discusses applications and public datasets, and outlines open challenges such as modality heterogeneity, privacy leakage, efficiency, and personalization. By anchoring MFL to FL paradigms, the paper clarifies the trade-offs between modality diversity, privacy guarantees, and system constraints, and points to future directions including cross-modal knowledge transfer and interpretable, personalized MFL systems.

Abstract

Multimodal Federated Learning (MFL) lies at the intersection of two pivotal research areas: leveraging complementary information from multiple modalities to improve downstream inference performance and enabling distributed training to enhance efficiency and preserve privacy. Despite the growing interest in MFL, there is currently no comprehensive taxonomy that organizes MFL through the lens of different Federated Learning (FL) paradigms. This perspective is important because multimodal data introduces distinct challenges across various FL settings. These challenges, including modality heterogeneity, privacy heterogeneity, and communication inefficiency, are fundamentally different from those encountered in traditional unimodal or non-FL scenarios. In this paper, we systematically examine MFL within the context of three major FL paradigms: horizontal FL (HFL), vertical FL (VFL), and hybrid FL. For each paradigm, we present the problem formulation, review representative training algorithms, and highlight the most prominent challenge introduced by multimodal data in distributed settings. We also discuss open challenges and provide insights for future research. By establishing this taxonomy, we aim to uncover the novel challenges posed by multimodal data from the perspective of different FL paradigms and to offer a new lens through which to understand and advance the development of MFL.

Paper Structure

This paper contains 20 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: MFL draws inspiration from human multi-sensory collaborative learning.
  • Figure 2: (a) HFL addresses horizontally partitioned sample spaces with consistent feature spaces. (b) VFL addresses vertically partitioned feature spaces with consistent sample spaces. (c) Hybrid FL arises from partitioning both the sample space and the feature space. Note that all three paradigms discussed in this paper involve multimodal data, which introduces new challenges compared to traditional unimodal or non-FL settings.
  • Figure 3: Our proposed taxonomy presents MFL from the perspective of different FL paradigms. The key challenges we highlight are specific to the integration of multimodality within each FL paradigm, rather than general issues found in unimodal FL or centralized multimodal learning.
  • Figure 4: Multimodal HFL, where each client shares the same multimodal feature space but holds a different sample space.
  • Figure 5: The problem of modality heterogeneity in computational pathology.
  • ...and 5 more figures