Table of Contents
Fetching ...

Byzantine-Robust Federated Learning with Learnable Aggregation Weights

Javad Parsa, Amir Hossein Daghestani, André M. H. Teixeira, Mikael Johansson

TL;DR

This work tackles Byzantine-robust federated learning under data heterogeneity by introducing FedLAW, a framework where aggregation weights are learnable and regularized via a sparse unit-capped simplex. The method jointly optimizes the global model ${oldsymbol heta}$ and the weights ${f w}$ through an alternating minimization scheme, using an inner quadratic-approximation step for ${oldsymbol heta}$ and a proximal, three-step projection to enforce sparsity and a unit-sum constraint on ${f w}$. Theoretical results establish Byzantine resilience and convergence to a neighborhood of the optimum under non-iid data and adversarial updates, with bounds that depend on asymptotic bias and variance quantities. Empirically, FedLAW consistently outperforms classical Byzantine defenses on MNIST and CIFAR-10 across multiple attack types and levels of data heterogeneity, and its learned weights suppress malicious clients rapidly, highlighting its practical robustness for secure federated deployments.

Abstract

Federated Learning (FL) enables clients to collaboratively train a global model without sharing their private data. However, the presence of malicious (Byzantine) clients poses significant challenges to the robustness of FL, particularly when data distributions across clients are heterogeneous. In this paper, we propose a novel Byzantine-robust FL optimization problem that incorporates adaptive weighting into the aggregation process. Unlike conventional approaches, our formulation treats aggregation weights as learnable parameters, jointly optimizing them alongside the global model parameters. To solve this optimization problem, we develop an alternating minimization algorithm with strong convergence guarantees under adversarial attack. We analyze the Byzantine resilience of the proposed objective. We evaluate the performance of our algorithm against state-of-the-art Byzantine-robust FL approaches across various datasets and attack scenarios. Experimental results demonstrate that our method consistently outperforms existing approaches, particularly in settings with highly heterogeneous data and a large proportion of malicious clients.

Byzantine-Robust Federated Learning with Learnable Aggregation Weights

TL;DR

This work tackles Byzantine-robust federated learning under data heterogeneity by introducing FedLAW, a framework where aggregation weights are learnable and regularized via a sparse unit-capped simplex. The method jointly optimizes the global model and the weights through an alternating minimization scheme, using an inner quadratic-approximation step for and a proximal, three-step projection to enforce sparsity and a unit-sum constraint on . Theoretical results establish Byzantine resilience and convergence to a neighborhood of the optimum under non-iid data and adversarial updates, with bounds that depend on asymptotic bias and variance quantities. Empirically, FedLAW consistently outperforms classical Byzantine defenses on MNIST and CIFAR-10 across multiple attack types and levels of data heterogeneity, and its learned weights suppress malicious clients rapidly, highlighting its practical robustness for secure federated deployments.

Abstract

Federated Learning (FL) enables clients to collaboratively train a global model without sharing their private data. However, the presence of malicious (Byzantine) clients poses significant challenges to the robustness of FL, particularly when data distributions across clients are heterogeneous. In this paper, we propose a novel Byzantine-robust FL optimization problem that incorporates adaptive weighting into the aggregation process. Unlike conventional approaches, our formulation treats aggregation weights as learnable parameters, jointly optimizing them alongside the global model parameters. To solve this optimization problem, we develop an alternating minimization algorithm with strong convergence guarantees under adversarial attack. We analyze the Byzantine resilience of the proposed objective. We evaluate the performance of our algorithm against state-of-the-art Byzantine-robust FL approaches across various datasets and attack scenarios. Experimental results demonstrate that our method consistently outperforms existing approaches, particularly in settings with highly heterogeneous data and a large proportion of malicious clients.

Paper Structure

This paper contains 38 sections, 20 theorems, 222 equations, 10 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

Denote $P_{L_s}({\bf h}_k)$ as the operator selecting the $s$ largest elements of the vector ${\bf h}_k$, and let ${\mathcal{P}}_{\Delta^+_t}$ be the projection operator onto the unit-capped simplex $\Delta^+_{t} = \{{\bf w}\in{\mathbb{R}}^{n}\mid \sum_{i=1}^{n} w_i = 1, w_i\geq 0, w_i\leq t\}$, wh

Figures (10)

  • Figure 1: Test accuracy and weight evolution on MNIST under the inverse gradient attack (setting: $q = 0.9$, 40% malicious; see Section \ref{['sec:numstudy']}). Left: Average test accuracy $\pm$1 std over 200 epochs and 5 runs, evaluated across multiple methods with 200 clients. Right: Aggregation weights of individual clients during the first 100 epochs (10 clients; MLP, batch size 64, 3 local epochs). Benign clients quickly converge to stable, non-trivial weights, while malicious clients are consistently suppressed.
  • Figure 3: Test accuracy on MNIST for four attack settings ($q{=}0.9$, $10$ clients, $40\%$ malicious), contrasting FedLAW‑BSUM (solid blue) with FedLAW (dashed green).
  • Figure 4: Client‑weight dynamics on MNIST under four adversarial settings ($q{=}0.9$, $10$ clients, $40\%$ malicious). Two top row: FedLAW. Two bottom row: FedLAW‑BSUM. Each panel tracks the aggregation weight of every client during the first 100 global epochs for a three‑layer MLP (batch size 64, three local epochs). Across all attacks, benign clients (grey) quickly converge to a stable weight, while malicious clients (red/orange) are pushed towards negligible influence. Notably, FedLAW suppresses attackers faster than BSUM, especially for the gradient‑based attacks (inverse gradient, double attack), illustrating its stronger resilience.
  • Figure 5: FedLAW sensitivity to $\beta$ on MNIST ($q{=}0.9$, 40% flipping label attack). Error bars denote ±1 standard deviation across 5 runs.
  • Figure 6: FedLAW sensitivity to $\beta$ on CIFAR‑10 ($q{=}0.9$, 40% flipping label attack). Error bars denote ±1 standard deviation across 5 runs.
  • ...and 5 more figures

Theorems & Definitions (46)

  • Theorem 1
  • proof
  • Remark 1
  • Proposition 1
  • proof
  • Theorem 2: High-Probability Byzantine Resilience
  • proof
  • Theorem 3
  • proof
  • Definition 1
  • ...and 36 more