Table of Contents
Fetching ...

Analyzing Federated Learning through an Adversarial Lens

Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mittal, Seraphin Calo

TL;DR

This work demonstrates that a single, non-colluding adversary can perform targeted model poisoning in federated learning, molding the global model to misclassify chosen inputs with high confidence while preserving convergence on benign data. It introduces boosting, stealth objectives, and alternating minimization to enhance attack effectiveness and concealment, and shows these strategies remain potent even against Byzantine-resilient aggregators like Krum and coordinate-wise median. The study also explores estimation-based improvements to attack performance and finds that informed predictions of other agents' updates bolster success, while data-poisoning is comparatively ineffective in this setting. Finally, it reveals that current interpretability methods fail to distinguish poisoned from benign models, underscoring the need for robust defenses and improved explanations in FL scenarios.

Abstract

Federated learning distributes model training among a multitude of agents, who, guided by privacy concerns, perform training using their local data but share only model parameter updates, for iterative aggregation at the server. In this work, we explore the threat of model poisoning attacks on federated learning initiated by a single, non-colluding malicious agent where the adversarial objective is to cause the model to misclassify a set of chosen inputs with high confidence. We explore a number of strategies to carry out this attack, starting with simple boosting of the malicious agent's update to overcome the effects of other agents' updates. To increase attack stealth, we propose an alternating minimization strategy, which alternately optimizes for the training loss and the adversarial objective. We follow up by using parameter estimation for the benign agents' updates to improve on attack success. Finally, we use a suite of interpretability techniques to generate visual explanations of model decisions for both benign and malicious models and show that the explanations are nearly visually indistinguishable. Our results indicate that even a highly constrained adversary can carry out model poisoning attacks while simultaneously maintaining stealth, thus highlighting the vulnerability of the federated learning setting and the need to develop effective defense strategies.

Analyzing Federated Learning through an Adversarial Lens

TL;DR

This work demonstrates that a single, non-colluding adversary can perform targeted model poisoning in federated learning, molding the global model to misclassify chosen inputs with high confidence while preserving convergence on benign data. It introduces boosting, stealth objectives, and alternating minimization to enhance attack effectiveness and concealment, and shows these strategies remain potent even against Byzantine-resilient aggregators like Krum and coordinate-wise median. The study also explores estimation-based improvements to attack performance and finds that informed predictions of other agents' updates bolster success, while data-poisoning is comparatively ineffective in this setting. Finally, it reveals that current interpretability methods fail to distinguish poisoned from benign models, underscoring the need for robust defenses and improved explanations in FL scenarios.

Abstract

Federated learning distributes model training among a multitude of agents, who, guided by privacy concerns, perform training using their local data but share only model parameter updates, for iterative aggregation at the server. In this work, we explore the threat of model poisoning attacks on federated learning initiated by a single, non-colluding malicious agent where the adversarial objective is to cause the model to misclassify a set of chosen inputs with high confidence. We explore a number of strategies to carry out this attack, starting with simple boosting of the malicious agent's update to overcome the effects of other agents' updates. To increase attack stealth, we propose an alternating minimization strategy, which alternately optimizes for the training loss and the adversarial objective. We follow up by using parameter estimation for the benign agents' updates to improve on attack success. Finally, we use a suite of interpretability techniques to generate visual explanations of model decisions for both benign and malicious models and show that the explanations are nearly visually indistinguishable. Our results indicate that even a highly constrained adversary can carry out model poisoning attacks while simultaneously maintaining stealth, thus highlighting the vulnerability of the federated learning setting and the need to develop effective defense strategies.

Paper Structure

This paper contains 32 sections, 5 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Targeted model poisoning attack for CNN on Fashion MNIST data. The total number of agents is $K=10$, including the malicious agents. All agents train their local models for 5 epochs with the appropriate objective.
  • Figure 2: Stealthy model poisoning for CNN on Fashion MNIST. We use $\lambda=10$ and $\rho=1e^{-4}$ for the malicious agent's objective.
  • Figure 3: Alternating minimization attack with distance constraints for CNN on Fashion MNIST data. We use $\lambda=10$ and $\rho=1e^{-4}$. The number of epochs used by the malicious agent is $E_m=10$ and it runs $10$ steps of the stealth objective for every step of the malicious objective.
  • Figure 4: Range of $\ell_2$ distances between all benign agents and between the malicious agent and the benign agents.
  • Figure 5: Model poisoning attacks with Byzantine resilient aggregation mechanisms. We use targeted model poisoning for coomed and alternating minimization for Krum.
  • ...and 9 more figures