Table of Contents
Fetching ...

Improving $(α, f)$-Byzantine Resilience in Federated Learning via layerwise aggregation and cosine distance

Mario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera

TL;DR

Byzantine resilience in Federated Learning degrades in high-dimensional settings, hindering robust aggregation. The authors propose Layerwise Cosine Aggregation, which combines layerwise partitioning with cosine distance and median gradient clipping to improve robustness without adding computational overhead. They prove that layerwise application preserves $(\alpha,f)$-Byzantine resilience and demonstrate up to 16% accuracy gains over baseline robust operators (Krum, Bulyan, GeoMed) across multiple image datasets and attack scenarios. The approach provides a scalable, practical enhancement to privacy-preserving FL, capable of bridging robust aggregation with real-world distributed learning deployments.

Abstract

The rapid development of artificial intelligence systems has amplified societal concerns regarding their usage, necessitating regulatory frameworks that encompass data privacy. Federated Learning (FL) is posed as potential solution to data privacy challenges in distributed machine learning by enabling collaborative model training {without data sharing}. However, FL systems remain vulnerable to Byzantine attacks, where malicious nodes contribute corrupted model updates. While Byzantine Resilient operators have emerged as a widely adopted robust aggregation algorithm to mitigate these attacks, its efficacy diminishes significantly in high-dimensional parameter spaces, sometimes leading to poor performing models. This paper introduces Layerwise Cosine Aggregation, a novel aggregation scheme designed to enhance robustness of these rules in such high-dimensional settings while preserving computational efficiency. A theoretical analysis is presented, demonstrating the superior robustness of the proposed Layerwise Cosine Aggregation compared to original robust aggregation operators. Empirical evaluation across diverse image classification datasets, under varying data distributions and Byzantine attack scenarios, consistently demonstrates the improved performance of Layerwise Cosine Aggregation, achieving up to a 16% increase in model accuracy.

Improving $(α, f)$-Byzantine Resilience in Federated Learning via layerwise aggregation and cosine distance

TL;DR

Byzantine resilience in Federated Learning degrades in high-dimensional settings, hindering robust aggregation. The authors propose Layerwise Cosine Aggregation, which combines layerwise partitioning with cosine distance and median gradient clipping to improve robustness without adding computational overhead. They prove that layerwise application preserves -Byzantine resilience and demonstrate up to 16% accuracy gains over baseline robust operators (Krum, Bulyan, GeoMed) across multiple image datasets and attack scenarios. The approach provides a scalable, practical enhancement to privacy-preserving FL, capable of bridging robust aggregation with real-world distributed learning deployments.

Abstract

The rapid development of artificial intelligence systems has amplified societal concerns regarding their usage, necessitating regulatory frameworks that encompass data privacy. Federated Learning (FL) is posed as potential solution to data privacy challenges in distributed machine learning by enabling collaborative model training {without data sharing}. However, FL systems remain vulnerable to Byzantine attacks, where malicious nodes contribute corrupted model updates. While Byzantine Resilient operators have emerged as a widely adopted robust aggregation algorithm to mitigate these attacks, its efficacy diminishes significantly in high-dimensional parameter spaces, sometimes leading to poor performing models. This paper introduces Layerwise Cosine Aggregation, a novel aggregation scheme designed to enhance robustness of these rules in such high-dimensional settings while preserving computational efficiency. A theoretical analysis is presented, demonstrating the superior robustness of the proposed Layerwise Cosine Aggregation compared to original robust aggregation operators. Empirical evaluation across diverse image classification datasets, under varying data distributions and Byzantine attack scenarios, consistently demonstrates the improved performance of Layerwise Cosine Aggregation, achieving up to a 16% increase in model accuracy.

Paper Structure

This paper contains 18 sections, 2 theorems, 12 equations, 8 figures.

Key Result

Proposition 1

The layerwise application of an $(\alpha, f)$-Byzantine Resilient rule $\mathcal{F}$, $L\mathcal{F}$, is also an $(\alpha, f)$-Byzantine Resilient rule.

Figures (8)

  • Figure 1: Parameter Distribution across Layers in a Two-Layer CNN used in some of the experiments. This figure highlights the significant parameter imbalance inherent in a basic two-layer CNN. A histogram directly compares the parameter count of the second convolutional layer (purple) and the first dense layer (red), revealing an order of magnitude difference. This disparity underscores the typical parameter distribution imbalance in such architectures. Detailed architectural specifications are provided in Section \ref{['sec:details']}.
  • Figure 2: In a layerwise aggregation approach, original vectors are projected into orthogonal subspaces that partition the original space. Each set of projected vectors is then aggregated, and the resulting aggregated vectors are concatenated to produce a vector in the original space.
  • Figure 3: Test Loss in multiple image classification datasets depending on the training round for Krum.
  • Figure 5: Test Loss in multiple image classification datasets depending on the training round for GeoMed.
  • Figure 7: Test Loss in multiple image classification datasets depending on the training round for Bulyan.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 1
  • Proposition 1
  • proof
  • Corollary 1
  • proof