Table of Contents
Fetching ...

Convergence of Byzantine-Resilient Gradient Tracking via Probabilistic Edge Dropout

Amirhossein Dezhboro, Fateme Maleki, Arman Adibi, Erfan Amini, Jose E. Ramirez-Marquez

Abstract

We study distributed optimization over networks with Byzantine agents that may send arbitrary adversarial messages. We propose \emph{Gradient Tracking with Probabilistic Edge Dropout} (GT-PD), a stochastic gradient tracking method that preserves the convergence properties of gradient tracking under adversarial communication. GT-PD combines two complementary defense layers: a universal self-centered projection that clips each incoming message to a ball of radius $τ$ around the receiving agent, and a fully decentralized probabilistic dropout rule driven by a dual-metric trust score in the decision and tracking channels. This design bounds adversarial perturbations while preserving the doubly stochastic mixing structure, a property often lost under robust aggregation in decentralized settings. Under complete Byzantine isolation ($p_b=0$), GT-PD converges linearly to a neighborhood determined solely by stochastic gradient variance. For partial isolation ($p_b>0$), we introduce \emph{Gradient Tracking with Probabilistic Edge Dropout and Leaky Integration} (GT-PD-L), which uses a leaky integrator to control the accumulation of tracking errors caused by persistent perturbations and achieves linear convergence to a bounded neighborhood determined by the stochastic variance and the clipping-to-leak ratio. We further show that under two-tier dropout with $p_h=1$, isolating Byzantine agents introduces no additional variance into the honest consensus dynamics. Experiments on MNIST under Sign Flip, ALIE, and Inner Product Manipulation attacks show that GT-PD-L outperforms coordinate-wise trimmed mean by up to 4.3 percentage points under stealth attacks.

Convergence of Byzantine-Resilient Gradient Tracking via Probabilistic Edge Dropout

Abstract

We study distributed optimization over networks with Byzantine agents that may send arbitrary adversarial messages. We propose \emph{Gradient Tracking with Probabilistic Edge Dropout} (GT-PD), a stochastic gradient tracking method that preserves the convergence properties of gradient tracking under adversarial communication. GT-PD combines two complementary defense layers: a universal self-centered projection that clips each incoming message to a ball of radius around the receiving agent, and a fully decentralized probabilistic dropout rule driven by a dual-metric trust score in the decision and tracking channels. This design bounds adversarial perturbations while preserving the doubly stochastic mixing structure, a property often lost under robust aggregation in decentralized settings. Under complete Byzantine isolation (), GT-PD converges linearly to a neighborhood determined solely by stochastic gradient variance. For partial isolation (), we introduce \emph{Gradient Tracking with Probabilistic Edge Dropout and Leaky Integration} (GT-PD-L), which uses a leaky integrator to control the accumulation of tracking errors caused by persistent perturbations and achieves linear convergence to a bounded neighborhood determined by the stochastic variance and the clipping-to-leak ratio. We further show that under two-tier dropout with , isolating Byzantine agents introduces no additional variance into the honest consensus dynamics. Experiments on MNIST under Sign Flip, ALIE, and Inner Product Manipulation attacks show that GT-PD-L outperforms coordinate-wise trimmed mean by up to 4.3 percentage points under stealth attacks.

Paper Structure

This paper contains 26 sections, 19 theorems, 89 equations, 3 figures.

Key Result

Proposition 1

If $W$ is symmetric, then $W^k$ defined by eq:Wk is symmetric doubly stochastic almost surely for every realization of the dropout mask. $\blacktriangleleft$$\blacktriangleleft$

Figures (3)

  • Figure 1: GT-PD system model. A decentralized network of $N$ agents communicates over a graph $G$, where $M$ agents are Byzantine adversaries. Honest agents (blue) exchange decision variables via reliable links (solid arrows). Edges to suspected adversaries (dashed) are subject to probabilistic dropout with retention probability $p_{ij} \to 0$.
  • Figure 2: Test accuracy of GT-PD, GT-PD-L, CWTM, and unprotected gradient tracking under three Byzantine attacks on MNIST ($n=20$, $b=4$, non-IID). GT-PD-L achieves the highest accuracy in all three settings, outperforming CWTM by $4.3$ percentage points under the stealth ALIE attack.
  • Figure 3: Extended diagnostics across three attacks (rows). Column (a): test accuracy. Column (b): consensus disagreement $C^k$ on log scale. Column (c): mean retention probabilities for honest-honest ($p_{hh}$, solid) and honest-Byzantine ($p_{hb}$, dashed) edges. Under ALIE, the dropout layer fails to isolate Byzantine agents ($p_{hb} > p_{hh}$), yet GT-PD-L converges via the projection and leaky integrator layers.

Theorems & Definitions (57)

  • Remark 1: Filtration convention
  • Definition 1: Base mixing matrix
  • Definition 2: Dropout mask
  • Proposition 1: Doubly stochastic property
  • proof
  • Remark 2
  • Definition 3: Self-centered projection
  • Proposition 2: Enforced perturbation bound
  • proof
  • Proposition 3: Honest non-clipping under complete isolation
  • ...and 47 more