Table of Contents
Fetching ...

Multimodal Federated Learning With Missing Modalities through Feature Imputation Network

Pranav Poudel, Aavash Chhetri, Prashnna Gyawali, Georgios Leontidis, Binod Bhattarai

TL;DR

This work tackles missing modalities in multimodal federated learning for healthcare by introducing a lightweight Feature Imputation Network (FIN) that operates in the representation space. FIN learns cross-modal mappings between bottleneck features from encoders for image $(I)$ and text $(T)$, enabling inference with incomplete data without sharing raw samples. Evaluations on MIMIC-CXR, NIH Open-I, and CheXpert across homogeneous and heterogeneous client configurations show that FIN outperforms naive imputations and a public-data–based generative baseline while remaining competitive with CAR-MFL, all with substantially lower communication and computation costs. The approach offers a privacy-preserving, scalable alternative for integrating multimodal information in clinical settings and points to extensions to additional modalities and architectures.

Abstract

Multimodal federated learning holds immense potential for collaboratively training models from multiple sources without sharing raw data, addressing both data scarcity and privacy concerns, two key challenges in healthcare. A major challenge in training multimodal federated models in healthcare is the presence of missing modalities due to multiple reasons, including variations in clinical practice, cost and accessibility constraints, retrospective data collection, privacy concerns, and occasional technical or human errors. Previous methods typically rely on publicly available real datasets or synthetic data to compensate for missing modalities. However, obtaining real datasets for every disease is impractical, and training generative models to synthesize missing modalities is computationally expensive and prone to errors due to the high dimensionality of medical data. In this paper, we propose a novel, lightweight, low-dimensional feature translator to reconstruct bottleneck features of the missing modalities. Our experiments on three different datasets (MIMIC-CXR, NIH Open-I, and CheXpert), in both homogeneous and heterogeneous settings consistently improve the performance of competitive baselines. The code and implementation details are available at: https://github.com/bhattarailab/FedFeatGen

Multimodal Federated Learning With Missing Modalities through Feature Imputation Network

TL;DR

This work tackles missing modalities in multimodal federated learning for healthcare by introducing a lightweight Feature Imputation Network (FIN) that operates in the representation space. FIN learns cross-modal mappings between bottleneck features from encoders for image and text , enabling inference with incomplete data without sharing raw samples. Evaluations on MIMIC-CXR, NIH Open-I, and CheXpert across homogeneous and heterogeneous client configurations show that FIN outperforms naive imputations and a public-data–based generative baseline while remaining competitive with CAR-MFL, all with substantially lower communication and computation costs. The approach offers a privacy-preserving, scalable alternative for integrating multimodal information in clinical settings and points to extensions to additional modalities and architectures.

Abstract

Multimodal federated learning holds immense potential for collaboratively training models from multiple sources without sharing raw data, addressing both data scarcity and privacy concerns, two key challenges in healthcare. A major challenge in training multimodal federated models in healthcare is the presence of missing modalities due to multiple reasons, including variations in clinical practice, cost and accessibility constraints, retrospective data collection, privacy concerns, and occasional technical or human errors. Previous methods typically rely on publicly available real datasets or synthetic data to compensate for missing modalities. However, obtaining real datasets for every disease is impractical, and training generative models to synthesize missing modalities is computationally expensive and prone to errors due to the high dimensionality of medical data. In this paper, we propose a novel, lightweight, low-dimensional feature translator to reconstruct bottleneck features of the missing modalities. Our experiments on three different datasets (MIMIC-CXR, NIH Open-I, and CheXpert), in both homogeneous and heterogeneous settings consistently improve the performance of competitive baselines. The code and implementation details are available at: https://github.com/bhattarailab/FedFeatGen

Paper Structure

This paper contains 12 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: This figure shows sample of data from three different datasets collected at three different institutions. In CheXpert, there are only X-ray scans available, while the two other benchmarks have both X-ray scans and radiology reports. This demonstrates an instance of missing modality in a real-world scenario.
  • Figure 2: Illustration of Feature Imputation Network-based Multimodal Federated Learning. (a) Multimodal Federated Learning system with different types of clients. (b) Training of the Feature Imputation Network in multimodal client. (c) Architecture of Feature Imputation Network (d) Unimodal image client training with the help of the Feature Imputation Network.
  • Figure 3: t-SNE plot of feature vectors from the model trained in (a) the homogeneous setup and (b) the heterogeneous setup. In the Figure, Upperbound refers to the model trained in a federated manner with complete modalities. Feature vectors are generated using the validation data.