Improving $(α, f)$-Byzantine Resilience in Federated Learning via layerwise aggregation and cosine distance
Mario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera
TL;DR
Byzantine resilience in Federated Learning degrades in high-dimensional settings, hindering robust aggregation. The authors propose Layerwise Cosine Aggregation, which combines layerwise partitioning with cosine distance and median gradient clipping to improve robustness without adding computational overhead. They prove that layerwise application preserves $(\alpha,f)$-Byzantine resilience and demonstrate up to 16% accuracy gains over baseline robust operators (Krum, Bulyan, GeoMed) across multiple image datasets and attack scenarios. The approach provides a scalable, practical enhancement to privacy-preserving FL, capable of bridging robust aggregation with real-world distributed learning deployments.
Abstract
The rapid development of artificial intelligence systems has amplified societal concerns regarding their usage, necessitating regulatory frameworks that encompass data privacy. Federated Learning (FL) is posed as potential solution to data privacy challenges in distributed machine learning by enabling collaborative model training {without data sharing}. However, FL systems remain vulnerable to Byzantine attacks, where malicious nodes contribute corrupted model updates. While Byzantine Resilient operators have emerged as a widely adopted robust aggregation algorithm to mitigate these attacks, its efficacy diminishes significantly in high-dimensional parameter spaces, sometimes leading to poor performing models. This paper introduces Layerwise Cosine Aggregation, a novel aggregation scheme designed to enhance robustness of these rules in such high-dimensional settings while preserving computational efficiency. A theoretical analysis is presented, demonstrating the superior robustness of the proposed Layerwise Cosine Aggregation compared to original robust aggregation operators. Empirical evaluation across diverse image classification datasets, under varying data distributions and Byzantine attack scenarios, consistently demonstrates the improved performance of Layerwise Cosine Aggregation, achieving up to a 16% increase in model accuracy.
