Table of Contents
Fetching ...

Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities

Kasra Borazjani, Naji Khosravan, Leslie Ying, Seyyedali Hosseinalipour

TL;DR

The paper tackles cancer-staging with multi-modal data distributed across institutions that may not share all data modalities and exhibit non-IID distributions. It introduces two key techniques, Distributed Gradient Blending (DGB) to non-uniformly weight modality gradients, and Proximity-Aware Client Weighting (PCW) to adjust for data-quality differences across clients. Through TCGA-based experiments across BRCA, LUSC, and LIHC with mRNA, image, and clinical data, the approach demonstrates improved convergence, reduced modality bias, and better cross-cohort balance compared to baselines, approaching an upper-bound where all modalities are uniformly available. The work advances practical multi-modal FL for healthcare by explicitly addressing modality heterogeneity and non-IID data, with implications for scalable privacy-preserving cancer prognostics and future extensions to data-imputation and attention-based fusion.

Abstract

The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.

Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities

TL;DR

The paper tackles cancer-staging with multi-modal data distributed across institutions that may not share all data modalities and exhibit non-IID distributions. It introduces two key techniques, Distributed Gradient Blending (DGB) to non-uniformly weight modality gradients, and Proximity-Aware Client Weighting (PCW) to adjust for data-quality differences across clients. Through TCGA-based experiments across BRCA, LUSC, and LIHC with mRNA, image, and clinical data, the approach demonstrates improved convergence, reduced modality bias, and better cross-cohort balance compared to baselines, approaching an upper-bound where all modalities are uniformly available. The work advances practical multi-modal FL for healthcare by explicitly addressing modality heterogeneity and non-IID data, with implications for scalable privacy-preserving cancer prognostics and future extensions to data-imputation and attention-based fusion.

Abstract

The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.
Paper Structure (29 sections, 19 equations, 5 figures, 18 tables, 1 algorithm)

This paper contains 29 sections, 19 equations, 5 figures, 18 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) Our system model: each institution $n$ can have access to a subset of data modalities ($\mathcal{M}_n \subseteq \mathcal{M}$). (b) The server gathers encoders and classifiers and aggregates them to global models and sends them back to institutions.
  • Figure 2: A schematic of the encoder-classifier structure at an arbitrary institution $n$ with three data modalities.
  • Figure 3: Loss plots of the global models based on the modalities possessed across institutions in the first 100 global aggregation steps. Plots consider various modality combinations: (a) image and mRNA modalities, (b) clinical and mRNA, and (c) image and clinical. The shaded areas represent the noise (20% standard deviation) for each combination of modalities tested. A key observation is the occasional degradation of the performance of the multi-modal ML models compared to their single-modal counterparts. This observation is consistent with the claims in mm-fl-gradblend which emphasises on the necessity of syncing the convergence rates of the modalities in a multi-modal FL scenario. Further, our method leads to the balance of convergence rates across the modalities.
  • Figure 4: A sample of the histopathological image, containing dense pixels.
  • Figure 5: Bar plot of the values of $\overline{\rho}_{C,n}^{(t)}$ (y-axis) for all institutions $n \in \mathcal{N}$ (x-axis) for each scenario: (a) binary classification, type-based, (b) binary classification, class-based, (c) multi-class classification, type-based, and (d) multi-class classification, class-based. The numbers beside each bar plot indicate the category of the institution's data based on Table \ref{['tab:type-heterogeneity-distr']} for (a) and (c) and Table \ref{['tab:class-heteogeneity-distr']} for (b) and (d).

Theorems & Definitions (1)

  • Remark 1