Table of Contents
Fetching ...

Addressing Data Heterogeneity in Federated Learning of Cox Proportional Hazards Models

Navid Seidi, Satyaki Roy, Sajal K. Das, Ardhendu Tripathy

TL;DR

This work tackles data heterogeneity and privacy in federated survival analysis by extending the Cox Proportional Hazards model to a federated setting. It introduces two heterogeneity-handling strategies: Naive Global Models Parameters Averaging for common features and Feature Presence Clustering to form cluster-wise aggregations based on feature availability, complemented by an event-based reporting scheme to cut communication. The approach demonstrates improved predictive accuracy, via Concordance Index gains, on simulated data and the SEER breast cancer dataset, and shows favorable computational efficiency compared with baselines. The results suggest practical pathways for robust, privacy-preserving survival analysis in heterogeneous healthcare data, with potential for personalized medicine applications.

Abstract

The diversity in disease profiles and therapeutic approaches between hospitals and health professionals underscores the need for patient-centric personalized strategies in healthcare. Alongside this, similarities in disease progression across patients can be utilized to improve prediction models in survival analysis. The need for patient privacy and the utility of prediction models can be simultaneously addressed in the framework of Federated Learning (FL). This paper outlines an approach in the domain of federated survival analysis, specifically the Cox Proportional Hazards (CoxPH) model, with a specific focus on mitigating data heterogeneity and elevating model performance. We present an FL approach that employs feature-based clustering to enhance model accuracy across synthetic datasets and real-world applications, including the Surveillance, Epidemiology, and End Results (SEER) database. Furthermore, we consider an event-based reporting strategy that provides a dynamic approach to model adaptation by responding to local data changes. Our experiments show the efficacy of our approach and discuss future directions for a practical application of FL in healthcare.

Addressing Data Heterogeneity in Federated Learning of Cox Proportional Hazards Models

TL;DR

This work tackles data heterogeneity and privacy in federated survival analysis by extending the Cox Proportional Hazards model to a federated setting. It introduces two heterogeneity-handling strategies: Naive Global Models Parameters Averaging for common features and Feature Presence Clustering to form cluster-wise aggregations based on feature availability, complemented by an event-based reporting scheme to cut communication. The approach demonstrates improved predictive accuracy, via Concordance Index gains, on simulated data and the SEER breast cancer dataset, and shows favorable computational efficiency compared with baselines. The results suggest practical pathways for robust, privacy-preserving survival analysis in heterogeneous healthcare data, with potential for personalized medicine applications.

Abstract

The diversity in disease profiles and therapeutic approaches between hospitals and health professionals underscores the need for patient-centric personalized strategies in healthcare. Alongside this, similarities in disease progression across patients can be utilized to improve prediction models in survival analysis. The need for patient privacy and the utility of prediction models can be simultaneously addressed in the framework of Federated Learning (FL). This paper outlines an approach in the domain of federated survival analysis, specifically the Cox Proportional Hazards (CoxPH) model, with a specific focus on mitigating data heterogeneity and elevating model performance. We present an FL approach that employs feature-based clustering to enhance model accuracy across synthetic datasets and real-world applications, including the Surveillance, Epidemiology, and End Results (SEER) database. Furthermore, we consider an event-based reporting strategy that provides a dynamic approach to model adaptation by responding to local data changes. Our experiments show the efficacy of our approach and discuss future directions for a practical application of FL in healthcare.
Paper Structure (20 sections, 1 theorem, 35 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 20 sections, 1 theorem, 35 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Given the assumptions of smoothness, strong convexity, and proper clusters, Algorithm alg:feature-presence-clustering converges to the optimal solution $\beta_{\text{global}, i}^*$ of the Cox Proportional Hazards model for each cluster $i$ if the learning rate satisfies $\eta < \mu/L^2$. The converg where $\beta^{(t)}$ denotes the global parameter for cluster $C_i$ at iteration $t$ of FL and $T$ i

Figures (3)

  • Figure 1: FL Framework illustration with Feature Presence Clustering. (a) Centers compute local model parameters $\beta_1, \ldots, \beta_n$ and send them to (b) the central Server to aggregate them and update global model parameters $\beta^*$.
  • Figure 2: Running times of the different algorithms (Alg. 1: FedAvg, Alg. 2: proposed algorithm, IFCA) for various numbers of clusters and centers.
  • Figure 3: Effect of dataset size on the selection frequency across learning rounds.

Theorems & Definitions (2)

  • Theorem 1: Convergence of Algorithm \ref{['alg:feature-presence-clustering']}
  • proof