Table of Contents
Fetching ...

Attacks on fairness in Federated Learning

Joseph Rance, Filip Svoboda

TL;DR

The paper addresses the risk of fairness violations in Federated Learning by introducing a novel attack that targets attribute-level fairness. It develops a formal threat model where a single or few malicious clients can steer the aggregated model toward uneven performance across subpopulations, deriving the malicious update under FedAvg as $m = (n_0/n) v + (1/n) \sum_i n_i u_i$ and solving for $v$ as $v = (n m - \sum_i n_i u_i)/n_0$. Empirical evaluation on CIFAR-10 with a ResNet-50 in a simulated FL setting demonstrates that a single compromised client can produce significant disparities between targeted attributes and the rest, with the effect depending on the number of participating honest clients. The work emphasizes the need for fairness-aware defenses in FL and discusses how existing backdoor defenses might be adapted to counteract these fairness attacks, highlighting practical implications for collaborative systems where unfair outcomes could harm participants.

Abstract

Federated Learning is an important emerging distributed training paradigm that keeps data private on clients. It is now well understood that by controlling only a small subset of FL clients, it is possible to introduce a backdoor to a federated learning model, in the presence of certain attributes. In this paper, we present a new type of attack that compromises the fairness of the trained model. Fairness is understood to be the attribute-level performance distribution of a trained model. It is particularly salient in domains where, for example, skewed accuracy discrimination between subpopulations could have disastrous consequences. We find that by employing a threat model similar to that of a backdoor attack, an attacker is able to influence the aggregated model to have an unfair performance distribution between any given set of attributes. Furthermore, we find that this attack is possible by controlling only a single client. While combating naturally induced unfairness in FL has previously been discussed in depth, its artificially induced kind has been neglected. We show that defending against attacks on fairness should be a critical consideration in any situation where unfairness in a trained model could benefit a user who participated in its training.

Attacks on fairness in Federated Learning

TL;DR

The paper addresses the risk of fairness violations in Federated Learning by introducing a novel attack that targets attribute-level fairness. It develops a formal threat model where a single or few malicious clients can steer the aggregated model toward uneven performance across subpopulations, deriving the malicious update under FedAvg as and solving for as . Empirical evaluation on CIFAR-10 with a ResNet-50 in a simulated FL setting demonstrates that a single compromised client can produce significant disparities between targeted attributes and the rest, with the effect depending on the number of participating honest clients. The work emphasizes the need for fairness-aware defenses in FL and discusses how existing backdoor defenses might be adapted to counteract these fairness attacks, highlighting practical implications for collaborative systems where unfair outcomes could harm participants.

Abstract

Federated Learning is an important emerging distributed training paradigm that keeps data private on clients. It is now well understood that by controlling only a small subset of FL clients, it is possible to introduce a backdoor to a federated learning model, in the presence of certain attributes. In this paper, we present a new type of attack that compromises the fairness of the trained model. Fairness is understood to be the attribute-level performance distribution of a trained model. It is particularly salient in domains where, for example, skewed accuracy discrimination between subpopulations could have disastrous consequences. We find that by employing a threat model similar to that of a backdoor attack, an attacker is able to influence the aggregated model to have an unfair performance distribution between any given set of attributes. Furthermore, we find that this attack is possible by controlling only a single client. While combating naturally induced unfairness in FL has previously been discussed in depth, its artificially induced kind has been neglected. We show that defending against attacks on fairness should be a critical consideration in any situation where unfairness in a trained model could benefit a user who participated in its training.
Paper Structure (11 sections, 3 equations, 2 figures, 1 table)

This paper contains 11 sections, 3 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Illustration of how the update vectors aggregate to a value similar to the target. Here, we assume that each client reports the same number of datapoints, $n_i$. The left group of vectors represent the computation done by the malicious client to obtain the target update (purple), while the right group of vectors represent the work done by the aggregator to obtain the aggregated update (yellow). We want to predict the predicted update (blue) to be close to the clean updates (green) so that the aggregated update (yellow) is close to the target update (red)
  • Figure 2: Accuracy of each class per training round of the aggregated model. The left column shows the accuracies for the 2 target classes, while the right shows accuracies for the 8 other classes. The rows show the 3, 10, and 30 client cases from top to bottom. The attack was started on round 80 and produces a clear accuracy discrepancy between its target classes of 0 and 1 and the other classes.