Table of Contents
Fetching ...

A Privacy-Preserving Federated Learning Method with Homomorphic Encryption in Omics Data

Yusaku Negoya, Feifei Cui, Zilong Zhang, Miao Pan, Tomoaki Ohtsuki, Aohan Li

TL;DR

This paper tackles privacy-preserving federated learning on sensitive omics data by addressing the DP-HE trade-off. It introduces PPML-Hybrid, a hybrid framework where clients autonomously choose between Homomorphic Encryption (HE) and Differential Privacy (DP), enabling noise-free gradient updates from HE clients while lighter DP-protected updates reduce computation. Empirical evaluation on a spatial transcriptomics dataset shows that PPML-Hybrid achieves predictive accuracy near the HE-only baseline and outperforms DP-only approaches under equivalent privacy budgets, with substantial gains in efficiency when heterogeneity in client resources is present. The approach offers a practical pathway to scalable, privacy-preserving omics FL in real-world, resource-diverse settings, with potential applicability beyond omics to other sensitive domains.

Abstract

Omics data is widely employed in medical research to identify disease mechanisms and contains highly sensitive personal information. Federated Learning (FL) with Differential Privacy (DP) can ensure the protection of omics data privacy against malicious user attacks. However, FL with the DP method faces an inherent trade-off: stronger privacy protection degrades predictive accuracy due to injected noise. On the other hand, Homomorphic Encryption (HE) allows computations on encrypted data and enables aggregation of encrypted gradients without DP-induced noise can increase the predictive accuracy. However, it may increase the computation cost. To improve the predictive accuracy while considering the computational ability of heterogeneous clients, we propose a Privacy-Preserving Machine Learning (PPML)-Hybrid method by introducing HE. In the proposed PPML-Hybrid method, clients distributed select either HE or DP based on their computational resources, so that HE clients contribute noise-free updates while DP clients reduce computational overhead. Meanwhile, clients with high computational resources clients can flexibly adopt HE or DP according to their privacy needs. Performance evaluation on omics datasets show that our proposed method achieves comparable predictive accuracy while significantly reducing computation time relative to HE-only. Additionally, it outperforms DP-only methods under equivalent or stricter privacy budgets.

A Privacy-Preserving Federated Learning Method with Homomorphic Encryption in Omics Data

TL;DR

This paper tackles privacy-preserving federated learning on sensitive omics data by addressing the DP-HE trade-off. It introduces PPML-Hybrid, a hybrid framework where clients autonomously choose between Homomorphic Encryption (HE) and Differential Privacy (DP), enabling noise-free gradient updates from HE clients while lighter DP-protected updates reduce computation. Empirical evaluation on a spatial transcriptomics dataset shows that PPML-Hybrid achieves predictive accuracy near the HE-only baseline and outperforms DP-only approaches under equivalent privacy budgets, with substantial gains in efficiency when heterogeneity in client resources is present. The approach offers a practical pathway to scalable, privacy-preserving omics FL in real-world, resource-diverse settings, with potential applicability beyond omics to other sensitive domains.

Abstract

Omics data is widely employed in medical research to identify disease mechanisms and contains highly sensitive personal information. Federated Learning (FL) with Differential Privacy (DP) can ensure the protection of omics data privacy against malicious user attacks. However, FL with the DP method faces an inherent trade-off: stronger privacy protection degrades predictive accuracy due to injected noise. On the other hand, Homomorphic Encryption (HE) allows computations on encrypted data and enables aggregation of encrypted gradients without DP-induced noise can increase the predictive accuracy. However, it may increase the computation cost. To improve the predictive accuracy while considering the computational ability of heterogeneous clients, we propose a Privacy-Preserving Machine Learning (PPML)-Hybrid method by introducing HE. In the proposed PPML-Hybrid method, clients distributed select either HE or DP based on their computational resources, so that HE clients contribute noise-free updates while DP clients reduce computational overhead. Meanwhile, clients with high computational resources clients can flexibly adopt HE or DP according to their privacy needs. Performance evaluation on omics datasets show that our proposed method achieves comparable predictive accuracy while significantly reducing computation time relative to HE-only. Additionally, it outperforms DP-only methods under equivalent or stricter privacy budgets.

Paper Structure

This paper contains 21 sections, 4 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: The proposed hybrid architecture with HE and DP clients.
  • Figure 2: Loss (MSE) vs. Number of Clients.
  • Figure 3: Comparison of PPML-Hybrid and PPML-Omics at different privacy budgets ($\epsilon=4,8$) with fixed HE ratio $\alpha=0.5$.
  • Figure 4: FL Time vs. Number of Clients.