Table of Contents
Fetching ...

SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

Murtaza Rangwala, Farag Azzedin, Richard O. Sinnott, Rajkumar Buyya

TL;DR

SketchGuard addresses the scalability bottleneck of Byzantine-robust decentralized federated learning by performing neighbor screening in the compressed sketch domain. It uses Count Sketch to generate $k$-dimensional summaries, screening neighbors with distances preserved up to a $(1+\epsilon)$-type factor, and only retrieves full models from accepted neighbors, reducing per-round communication from $O(d|N_i|)$ to $O(k|N_i| + d|S_i|)$ while maintaining theoretical convergence guarantees in both strongly convex and non-convex settings with a controlled $\gamma_{\text{eff}} = \gamma\sqrt{(1+\epsilon)/(1-\epsilon)}$. The authors prove that the compression preserves Byzantine resilience with degradation bounded by a factor $1+O(\epsilon)$ and provide convergence rates matching state-of-the-art methods. Comprehensive experiments on FEMNIST and CelebA across varied network topologies and attack models show SketchGuard achieves identical robustness to BALANCE and UBAR while reducing computation by up to $82\%$ and communication by $50$-$70\%$, with benefits that scale multiplicatively with model size and network connectivity.

Abstract

Decentralized Federated Learning (DFL) enables privacy-preserving collaborative training without centralized servers, but remains vulnerable to Byzantine attacks where malicious clients submit corrupted model updates. Existing Byzantine-robust DFL defenses rely on similarity-based neighbor screening that requires every client to exchange and compare complete high-dimensional model vectors with all neighbors in each training round, creating prohibitive communication and computational costs that prevent deployment at web scale. We propose SketchGuard, a general framework that decouples Byzantine filtering from model aggregation through sketch-based neighbor screening. SketchGuard compresses $d$-dimensional models to $k$-dimensional sketches ($k \ll d$) using Count Sketch for similarity comparisons, then selectively fetches full models only from accepted neighbors, reducing per-round communication complexity from $O(d|N_i|)$ to $O(k|N_i| + d|S_i|)$, where $|N_i|$ is the neighbor count and $|S_i| \le |N_i|$ is the accepted neighbor count. We establish rigorous convergence guarantees in both strongly convex and non-convex settings, proving that Count Sketch compression preserves Byzantine resilience with controlled degradation bounds where approximation errors introduce only a $(1+O(ε))$ factor in the effective threshold parameter. Comprehensive experiments across multiple datasets, network topologies, and attack scenarios demonstrate that SketchGuard maintains identical robustness to state-of-the-art methods while reducing computation time by up to 82% and communication overhead by 50-70% depending on filtering effectiveness, with benefits scaling multiplicatively with model dimensionality and network connectivity. These results establish the viability of sketch-based compression as a fundamental enabler of robust DFL at web scale.

SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

TL;DR

SketchGuard addresses the scalability bottleneck of Byzantine-robust decentralized federated learning by performing neighbor screening in the compressed sketch domain. It uses Count Sketch to generate -dimensional summaries, screening neighbors with distances preserved up to a -type factor, and only retrieves full models from accepted neighbors, reducing per-round communication from to while maintaining theoretical convergence guarantees in both strongly convex and non-convex settings with a controlled . The authors prove that the compression preserves Byzantine resilience with degradation bounded by a factor and provide convergence rates matching state-of-the-art methods. Comprehensive experiments on FEMNIST and CelebA across varied network topologies and attack models show SketchGuard achieves identical robustness to BALANCE and UBAR while reducing computation by up to and communication by -, with benefits that scale multiplicatively with model size and network connectivity.

Abstract

Decentralized Federated Learning (DFL) enables privacy-preserving collaborative training without centralized servers, but remains vulnerable to Byzantine attacks where malicious clients submit corrupted model updates. Existing Byzantine-robust DFL defenses rely on similarity-based neighbor screening that requires every client to exchange and compare complete high-dimensional model vectors with all neighbors in each training round, creating prohibitive communication and computational costs that prevent deployment at web scale. We propose SketchGuard, a general framework that decouples Byzantine filtering from model aggregation through sketch-based neighbor screening. SketchGuard compresses -dimensional models to -dimensional sketches () using Count Sketch for similarity comparisons, then selectively fetches full models only from accepted neighbors, reducing per-round communication complexity from to , where is the neighbor count and is the accepted neighbor count. We establish rigorous convergence guarantees in both strongly convex and non-convex settings, proving that Count Sketch compression preserves Byzantine resilience with controlled degradation bounds where approximation errors introduce only a factor in the effective threshold parameter. Comprehensive experiments across multiple datasets, network topologies, and attack scenarios demonstrate that SketchGuard maintains identical robustness to state-of-the-art methods while reducing computation time by up to 82% and communication overhead by 50-70% depending on filtering effectiveness, with benefits scaling multiplicatively with model dimensionality and network connectivity. These results establish the viability of sketch-based compression as a fundamental enabler of robust DFL at web scale.

Paper Structure

This paper contains 57 sections, 4 theorems, 32 equations, 5 figures, 9 tables, 1 algorithm.

Key Result

Lemma 1

For any vectors $\mathbf{u}, \mathbf{v} \in \mathbb{R}^d$ and sketch size $k = O(\epsilon^{-2}\log(1/\delta))$, with probability at least $1-\delta$:

Figures (5)

  • Figure 1: The SketchGuard Protocol
  • Figure 2: Network topologies used in the robustness evaluation experiments.
  • Figure 3: Impact of fraction of malicious clients on TER across different datasets and attack types.
  • Figure 4: Impact of fraction of malicious clients on TER across different network topologies.
  • Figure 5: Impact of network size and model dimensionality on per-node computation time.

Theorems & Definitions (5)

  • Lemma 1: Distance Preservation charikar2002finding
  • Remark 1
  • Theorem 1: Strongly Convex Convergence with Compression
  • Theorem 2: Non-Convex Convergence with Compression
  • Lemma 2: Sketch Compression Preserves Byzantine Resilience