Table of Contents
Fetching ...

Research on Large Language Model Cross-Cloud Privacy Protection and Collaborative Training based on Federated Learning

Ze Yang, Yihong Jin, Yihan Zhang, Juntian Liu, Xinhe Xu

TL;DR

This work addresses privacy and data-security challenges in cross-cloud LLM training by proposing a federated-learning framework that combines cryptographic gradient updates, dynamic bandwidth-aware aggregation, and cross-cloud synchronization. The methodology includes a cryptographic gradient update mechanism using $\mathcal{E}(\nabla L_k(w_k))$ with block-wise parallel processing and a hybrid encrypted-plaintext aggregation, along with a dynamic weighting rule $\omega_k = \frac{L_k(w_k)}{|D_k| \cdot B_k^{\alpha}}$ and a global update $w_t = \frac{\sum_{k=1}^{K} \omega_k w_k(t)}{\sum_{k=1}^{K} \omega_k}$ to adapt to heterogeneity and network conditions; an asynchronous synchronization strategy further mitigates cross-cloud latency. The paper demonstrates improved privacy protection, reduced communication and computation costs, and competitive model performance on the MIMIC-III dataset compared with centralized, FL, HE-FL, and DP-FL baselines. These contributions offer a practical pathway to secure, efficient, cross-cloud collaboration for large-scale language models in privacy-sensitive domains. The work highlights the potential to deploy privacy-preserving cross-cloud training at scale by leveraging cryptography, dynamic aggregation, and real-time network-aware synchronization.

Abstract

The fast development of large language models (LLMs) and popularization of cloud computing have led to increasing concerns on privacy safeguarding and data security of cross-cloud model deployment and training as the key challenges. We present a new framework for addressing these issues along with enabling privacy preserving collaboration on training between distributed clouds based on federated learning. Our mechanism encompasses cutting-edge cryptographic primitives, dynamic model aggregation techniques, and cross-cloud data harmonization solutions to enhance security, efficiency, and scalability to the traditional federated learning paradigm. Furthermore, we proposed a hybrid aggregation scheme to mitigate the threat of Data Leakage and to optimize the aggregation of model updates, thus achieving substantial enhancement on the model effectiveness and stability. Experimental results demonstrate that the training efficiency, privacy protection, and model accuracy of the proposed model compare favorably to those of the traditional federated learning method.

Research on Large Language Model Cross-Cloud Privacy Protection and Collaborative Training based on Federated Learning

TL;DR

This work addresses privacy and data-security challenges in cross-cloud LLM training by proposing a federated-learning framework that combines cryptographic gradient updates, dynamic bandwidth-aware aggregation, and cross-cloud synchronization. The methodology includes a cryptographic gradient update mechanism using with block-wise parallel processing and a hybrid encrypted-plaintext aggregation, along with a dynamic weighting rule and a global update to adapt to heterogeneity and network conditions; an asynchronous synchronization strategy further mitigates cross-cloud latency. The paper demonstrates improved privacy protection, reduced communication and computation costs, and competitive model performance on the MIMIC-III dataset compared with centralized, FL, HE-FL, and DP-FL baselines. These contributions offer a practical pathway to secure, efficient, cross-cloud collaboration for large-scale language models in privacy-sensitive domains. The work highlights the potential to deploy privacy-preserving cross-cloud training at scale by leveraging cryptography, dynamic aggregation, and real-time network-aware synchronization.

Abstract

The fast development of large language models (LLMs) and popularization of cloud computing have led to increasing concerns on privacy safeguarding and data security of cross-cloud model deployment and training as the key challenges. We present a new framework for addressing these issues along with enabling privacy preserving collaboration on training between distributed clouds based on federated learning. Our mechanism encompasses cutting-edge cryptographic primitives, dynamic model aggregation techniques, and cross-cloud data harmonization solutions to enhance security, efficiency, and scalability to the traditional federated learning paradigm. Furthermore, we proposed a hybrid aggregation scheme to mitigate the threat of Data Leakage and to optimize the aggregation of model updates, thus achieving substantial enhancement on the model effectiveness and stability. Experimental results demonstrate that the training efficiency, privacy protection, and model accuracy of the proposed model compare favorably to those of the traditional federated learning method.

Paper Structure

This paper contains 9 sections, 8 equations, 3 figures.

Figures (3)

  • Figure 1: Comparison of Data Leakage Rate
  • Figure 2: Comparison of Communication Cost
  • Figure 3: Comparison of Computation Cost