Table of Contents
Fetching ...

A Multiparty Homomorphic Encryption Approach to Confidential Federated Kaplan Meier Survival Analysis

Narasimha Raghavan Veeraragavan, Svetlana Boudko, Jan Franz Nygård

TL;DR

This work tackles privacy-preserving federated Kaplan–Meier survival analysis by introducing a multiparty threshold CKKS homomorphic encryption framework with native floating-point support. It provides a formal utility-loss and convergence analysis, along with explicit reconstruction-attack mitigation, and validates the approach on NCCTG Lung Cancer and synthetic Breast Cancer datasets. The results show that encrypted federated KM estimates closely match centralized or non-encrypted federated results, preserving statistical validity (e.g., log-rank tests) while enabling secure aggregation across up to 50 sites, albeit with an 8–19× runtime overhead. The study demonstrates that reconstruction attacks are most potent in small federations with high data overlap but are significantly weakened as federation size grows or overlaps decrease, highlighting the practical privacy benefits of threshold HE in multi-institutional survival analysis. Overall, the framework advances secure, high-fidelity federated survival analysis with formal guarantees and practical applicability for moderate-scale collaborations.

Abstract

The proliferation of healthcare data has expanded opportunities for collaborative research, yet stringent privacy regulations hinder pooling sensitive patient records. We propose a \emph{multiparty homomorphic encryption-based} framework for \emph{privacy-preserving federated Kaplan--Meier survival analysis}, offering native floating-point support, a theoretical model, and explicit reconstruction-attack mitigation. Compared to prior work, our framework ensures encrypted federated survival estimates closely match centralized outcomes, supported by formal utility-loss bounds that demonstrate convergence as aggregation and decryption noise diminish. Extensive experiments on the NCCTG Lung Cancer and synthetic Breast Cancer datasets confirm low \emph{mean absolute error (MAE)} and \emph{root mean squared error (RMSE)}, indicating negligible deviations between encrypted and non-encrypted survival curves. Log-rank and numerical accuracy tests reveal \emph{no significant difference} between federated encrypted and non-encrypted analyses, preserving statistical validity. A reconstruction-attack evaluation shows smaller federations (2--3 providers) with overlapping data between the institutions are vulnerable, a challenge mitigated by multiparty encryption. Larger federations (5--50 sites) degrade reconstruction accuracy further, with encryption improving confidentiality. Despite an 8--19$\times$ computational overhead, threshold-based homomorphic encryption is \emph{feasible for moderate-scale deployments}, balancing security and runtime. By providing robust privacy guarantees alongside high-fidelity survival estimates, our framework advances the state-of-the art in secure multi-institutional survival analysis.

A Multiparty Homomorphic Encryption Approach to Confidential Federated Kaplan Meier Survival Analysis

TL;DR

This work tackles privacy-preserving federated Kaplan–Meier survival analysis by introducing a multiparty threshold CKKS homomorphic encryption framework with native floating-point support. It provides a formal utility-loss and convergence analysis, along with explicit reconstruction-attack mitigation, and validates the approach on NCCTG Lung Cancer and synthetic Breast Cancer datasets. The results show that encrypted federated KM estimates closely match centralized or non-encrypted federated results, preserving statistical validity (e.g., log-rank tests) while enabling secure aggregation across up to 50 sites, albeit with an 8–19× runtime overhead. The study demonstrates that reconstruction attacks are most potent in small federations with high data overlap but are significantly weakened as federation size grows or overlaps decrease, highlighting the practical privacy benefits of threshold HE in multi-institutional survival analysis. Overall, the framework advances secure, high-fidelity federated survival analysis with formal guarantees and practical applicability for moderate-scale collaborations.

Abstract

The proliferation of healthcare data has expanded opportunities for collaborative research, yet stringent privacy regulations hinder pooling sensitive patient records. We propose a \emph{multiparty homomorphic encryption-based} framework for \emph{privacy-preserving federated Kaplan--Meier survival analysis}, offering native floating-point support, a theoretical model, and explicit reconstruction-attack mitigation. Compared to prior work, our framework ensures encrypted federated survival estimates closely match centralized outcomes, supported by formal utility-loss bounds that demonstrate convergence as aggregation and decryption noise diminish. Extensive experiments on the NCCTG Lung Cancer and synthetic Breast Cancer datasets confirm low \emph{mean absolute error (MAE)} and \emph{root mean squared error (RMSE)}, indicating negligible deviations between encrypted and non-encrypted survival curves. Log-rank and numerical accuracy tests reveal \emph{no significant difference} between federated encrypted and non-encrypted analyses, preserving statistical validity. A reconstruction-attack evaluation shows smaller federations (2--3 providers) with overlapping data between the institutions are vulnerable, a challenge mitigated by multiparty encryption. Larger federations (5--50 sites) degrade reconstruction accuracy further, with encryption improving confidentiality. Despite an 8--19 computational overhead, threshold-based homomorphic encryption is \emph{feasible for moderate-scale deployments}, balancing security and runtime. By providing robust privacy guarantees alongside high-fidelity survival estimates, our framework advances the state-of-the art in secure multi-institutional survival analysis.
Paper Structure (55 sections, 4 theorems, 11 equations, 7 figures, 4 tables, 5 algorithms)

This paper contains 55 sections, 4 theorems, 11 equations, 7 figures, 4 tables, 5 algorithms.

Key Result

Theorem 1

Let $\hat{S}_{\text{centralized}}(t)$ and $\hat{S}_{\text{federated}}(t)$ denote the Kaplan-Meier survival probabilities estimated at time $t$ in centralized and federated homomorphic encrypted (HE) settings, respectively. Assume: Under these assumptions, the utility loss $\Delta S(t)$ and error growth can be analyzed as follows: Utility Loss Bound: where $M_k \propto D_k$ reflects the relationsh

Figures (7)

  • Figure 1: Centralized, Survival Curves for the Lung and Breast Cancer datasets
  • Figure 2: Federated and Federated Encrypted Survival Curves for the Lung Cancer dataset at Varying Client Counts
  • Figure 3: Federated and Federated Encrypted Survival Curves for the Breast Cancer dataset at Varying Client Counts
  • Figure 4: Numerical Accuracy Results for the Lung and Breast Cancer datasets
  • Figure 5: NCCTG Lung Cancer Dataset: Reconstruction metrics comparing None, Small, and Large Overlap for $d_t$ and $n_{\text{at\_risk}}$. The x-axis is the number of providers, and each curve shows how an attacker's reconstruction quality changes with the federation size.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Theorem 1: Utility Loss Bound in Federated Homomorphic Encrypted Kaplan-Meier Estimators for Worst Case Scenario
  • proof
  • Theorem 2: Utility Loss Bound for Bounded At-Risk Counts with Noise Terms
  • proof
  • Theorem 3: Convergence of Federated Kaplan-Meier Estimators
  • proof
  • Theorem 4: Scalability of Noise with Client Count
  • proof