Byzantine-Robust Federated Learning over Ring-All-Reduce Distributed Computing

Minghong Fang; Zhuqing Liu; Xuecen Zhao; Jia Liu

Byzantine-Robust Federated Learning over Ring-All-Reduce Distributed Computing

Minghong Fang, Zhuqing Liu, Xuecen Zhao, Jia Liu

TL;DR

This work addresses the scalability and security challenges of federated learning by coupling Byzantine robustness with ring-all-reduce architectures. BRACE introduces 1-bit gradient quantization, neighbor sub-vector exchange, and a dimension-wise consensus threshold to mitigate poisoning while preserving bandwidth efficiency, achieving an $O(1/T)$ convergence rate under Byzantine attacks. The method demonstrates robustness and reduced communication costs on Fashion-MNIST and CIFAR-10 with large client counts, outperforming server-based and standard RAR defenses. The results suggest BRACE enables scalable, secure, serverless FL suitable for large-scale distributed deployments with heterogeneous data.

Abstract

Federated learning (FL) has gained attention as a distributed learning paradigm for its data privacy benefits and accelerated convergence through parallel computation. Traditional FL relies on a server-client (SC) architecture, where a central server coordinates multiple clients to train a global model, but this approach faces scalability challenges due to server communication bottlenecks. To overcome this, the ring-all-reduce (RAR) architecture has been introduced, eliminating the central server and achieving bandwidth optimality. However, the tightly coupled nature of RAR's ring topology exposes it to unique Byzantine attack risks not present in SC-based FL. Despite its potential, designing Byzantine-robust RAR-based FL algorithms remains an open problem. To address this gap, we propose BRACE (Byzantine-robust ring-all-reduce), the first RAR-based FL algorithm to achieve both Byzantine robustness and communication efficiency. We provide theoretical guarantees for the convergence of BRACE under Byzantine attacks, demonstrate its bandwidth efficiency, and validate its practical effectiveness through experiments. Our work offers a foundational understanding of Byzantine-robust RAR-based FL design.

Byzantine-Robust Federated Learning over Ring-All-Reduce Distributed Computing

TL;DR

Abstract

Byzantine-Robust Federated Learning over Ring-All-Reduce Distributed Computing

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)