GuardFed: A Trustworthy Federated Learning Framework Against Dual-Facet Attacks
Yanli Li, Yanan Zhou, Zhongliang Guo, Nan Yang, Yuning Zhang, Huaming Chen, Dong Yuan, Weiping Ding, Witold Pedrycz
TL;DR
This paper addresses the vulnerability of federated learning to dual-objective attacks that degrade both accuracy and group fairness. It introduces Dual-Facet Attack (DFA) with two variants (S-DFA and Sp-DFA) and a defense framework, GuardFed, that builds a fairness-aware reference model from synthetic server data generated via a Gaussian Copula and uses a dual-perspective trust score to selectively aggregate client updates. The authors provide a comprehensive experimental evaluation on COMPAS and ADULT datasets under varying non-IID conditions, showing that DFA can significantly degrade existing robust and fairness-aware FL methods, while GuardFed achieves state-of-the-art performance in both accuracy and fairness, even under strong adversarial pressure. The results demonstrate GuardFed’s practical potential for trustworthy and fair federated learning without requiring large-scale server data or stringent trust assumptions.
Abstract
Federated learning (FL) enables privacy-preserving collaborative model training but remains vulnerable to adversarial behaviors that compromise model utility or fairness across sensitive groups. While extensive studies have examined attacks targeting either objective, strategies that simultaneously degrade both utility and fairness remain largely unexplored. To bridge this gap, we introduce the Dual-Facet Attack (DFA), a novel threat model that concurrently undermines predictive accuracy and group fairness. Two variants, Synchronous DFA (S-DFA) and Split DFA (Sp-DFA), are further proposed to capture distinct real-world collusion scenarios. Experimental results show that existing robust FL defenses, including hybrid aggregation schemes, fail to resist DFAs effectively. To counter these threats, we propose GuardFed, a self-adaptive defense framework that maintains a fairness-aware reference model using a small amount of clean server data augmented with synthetic samples. In each training round, GuardFed computes a dual-perspective trust score for every client by jointly evaluating its utility deviation and fairness degradation, thereby enabling selective aggregation of trustworthy updates. Extensive experiments on real-world datasets demonstrate that GuardFed consistently preserves both accuracy and fairness under diverse non-IID and adversarial conditions, achieving state-of-the-art performance compared with existing robust FL methods.
