Table of Contents
Fetching ...

FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission

Zeling Zhang, Dongqi Cai, Yiran Zhang, Mengwei Xu, Shangguang Wang, Ao Zhou

TL;DR

FedRDMA is proposed, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol and divides the updated model into chunks and designs a series of optimization techniques to improve the efficiency and robustness of RDMA-based communication.

Abstract

Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol. To overcome the limitations of RDMA in wide-area networks (WANs), FedRDMA divides the updated model into chunks and designs a series of optimization techniques to improve the efficiency and robustness of RDMA-based communication. We implement FedRDMA atop the industrial federated learning framework and evaluate it on a real-world cross-silo FL scenario. The experimental results show that \sys can achieve up to 3.8$\times$ speedup in communication efficiency compared to traditional TCP/IP-based FL systems.

FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission

TL;DR

FedRDMA is proposed, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol and divides the updated model into chunks and designs a series of optimization techniques to improve the efficiency and robustness of RDMA-based communication.

Abstract

Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol. To overcome the limitations of RDMA in wide-area networks (WANs), FedRDMA divides the updated model into chunks and designs a series of optimization techniques to improve the efficiency and robustness of RDMA-based communication. We implement FedRDMA atop the industrial federated learning framework and evaluate it on a real-world cross-silo FL scenario. The experimental results show that \sys can achieve up to 3.8 speedup in communication efficiency compared to traditional TCP/IP-based FL systems.
Paper Structure (21 sections, 8 figures, 4 tables)

This paper contains 21 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: FedLLM convergence performance under different bandwidth.
  • Figure 2: TCP/IP vs RDMA protocal.
  • Figure 3: RDMA brings significant performance improvement in-domain but fails to work cross-domain.
  • Figure 4: The process of chunking packet for smooth RDMA transfer.
  • Figure 5: FedRDMA-E Workflow. Client B is sending chunked data X to Client A's A2 memory region.
  • ...and 3 more figures