Table of Contents
Fetching ...

FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology

Yuanzhe Peng, Jieming Bian, Jie Xu

TL;DR

Privacy-preserving multimodal learning in computational pathology is challenged by modality heterogeneity across hospitals. FedMM tackles this by federating multiple single-modal feature extractors for each modality, using global prototypes as pseudo-labels and a dynamic loss to align local embeddings with global representations while preserving local supervision. The server aggregates feature extractors and prototypes via FL mechanisms, enabling clients to perform local feature extraction and classification without sharing raw data. On TCGA-NSCLC and TCGA-RCC datasets, FedMM achieves higher accuracy and AUC than baselines, demonstrating practical, privacy-preserving benefits for modality-heterogeneous multimodal learning in pathology.

Abstract

The fusion of complementary multimodal information is crucial in computational pathology for accurate diagnostics. However, existing multimodal learning approaches necessitate access to users' raw data, posing substantial privacy risks. While Federated Learning (FL) serves as a privacy-preserving alternative, it falls short in addressing the challenges posed by heterogeneous (yet possibly overlapped) modalities data across various hospitals. To bridge this gap, we propose a Federated Multi-Modal (FedMM) learning framework that federatedly trains multiple single-modal feature extractors to enhance subsequent classification performance instead of existing FL that aims to train a unified multimodal fusion model. Any participating hospital, even with small-scale datasets or limited devices, can leverage these federated trained extractors to perform local downstream tasks (e.g., classification) while ensuring data privacy. Through comprehensive evaluations of two publicly available datasets, we demonstrate that FedMM notably outperforms two baselines in accuracy and AUC metrics.

FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology

TL;DR

Privacy-preserving multimodal learning in computational pathology is challenged by modality heterogeneity across hospitals. FedMM tackles this by federating multiple single-modal feature extractors for each modality, using global prototypes as pseudo-labels and a dynamic loss to align local embeddings with global representations while preserving local supervision. The server aggregates feature extractors and prototypes via FL mechanisms, enabling clients to perform local feature extraction and classification without sharing raw data. On TCGA-NSCLC and TCGA-RCC datasets, FedMM achieves higher accuracy and AUC than baselines, demonstrating practical, privacy-preserving benefits for modality-heterogeneous multimodal learning in pathology.

Abstract

The fusion of complementary multimodal information is crucial in computational pathology for accurate diagnostics. However, existing multimodal learning approaches necessitate access to users' raw data, posing substantial privacy risks. While Federated Learning (FL) serves as a privacy-preserving alternative, it falls short in addressing the challenges posed by heterogeneous (yet possibly overlapped) modalities data across various hospitals. To bridge this gap, we propose a Federated Multi-Modal (FedMM) learning framework that federatedly trains multiple single-modal feature extractors to enhance subsequent classification performance instead of existing FL that aims to train a unified multimodal fusion model. Any participating hospital, even with small-scale datasets or limited devices, can leverage these federated trained extractors to perform local downstream tasks (e.g., classification) while ensuring data privacy. Through comprehensive evaluations of two publicly available datasets, we demonstrate that FedMM notably outperforms two baselines in accuracy and AUC metrics.
Paper Structure (11 sections, 11 equations, 4 figures, 1 table)

This paper contains 11 sections, 11 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The problem of modality heterogeneity in computational pathology (left) and our solution (right). The aim is to harness distributed privacy-sensitive data, which might overlap on some modalities across several hospitals, to train multiple single-modal feature extractors federatedly. These federated extractors are intended to exhibit superior performance compared to locally trained extractors.
  • Figure 2: An illustration of a multimodal FL system with modality heterogeneity. It includes $N=3$ clients and one central server with $M=2$ modality processing components.
  • Figure 3: Comparison of the classification performance between FedMM and baselines based on two public datasets.
  • Figure 4: Ablation study of FedMM on the TCGA-NSCLC dataset.