Table of Contents
Fetching ...

Protection against Source Inference Attacks in Federated Learning

Andreas Athanasiou, Kangsoo Jung, Catuscia Palamidessi

TL;DR

This work proposes a defense against SIAs within the widely studied shuffle model of FL, where an honest shuffler acts as an intermediary between the clients and the server, and proposes a novel combination of parameter-level shuffling with the residue number system (RNS).

Abstract

Federated Learning (FL) was initially proposed as a privacy-preserving machine learning paradigm. However, FL has been shown to be susceptible to a series of privacy attacks. Recently, there has been concern about the Source Inference Attack (SIA), where an honest-but-curious central server attempts to identify exactly which client owns a given data point which was used in the training phase. Alarmingly, standard gradient obfuscation techniques with Differential Privacy have been shown to be ineffective against SIAs, at least without severely diminishing the accuracy. In this work, we propose a defense against SIAs within the widely studied shuffle model of FL, where an honest shuffler acts as an intermediary between the clients and the server. First, we demonstrate that standard naive shuffling alone is insufficient to prevent SIAs. To effectively defend against SIAs, shuffling needs to be applied at a more granular level; we propose a novel combination of parameter-level shuffling with the residue number system (RNS). Our approach provides robust protection against SIAs without affecting the accuracy of the joint model and can be seamlessly integrated into other privacy protection mechanisms. We conduct experiments on a series of models and datasets, confirming that standard shuffling approaches fail to prevent SIAs and that, in contrast, our proposed method reduce the attack's accuracy to the level of random guessing.

Protection against Source Inference Attacks in Federated Learning

TL;DR

This work proposes a defense against SIAs within the widely studied shuffle model of FL, where an honest shuffler acts as an intermediary between the clients and the server, and proposes a novel combination of parameter-level shuffling with the residue number system (RNS).

Abstract

Federated Learning (FL) was initially proposed as a privacy-preserving machine learning paradigm. However, FL has been shown to be susceptible to a series of privacy attacks. Recently, there has been concern about the Source Inference Attack (SIA), where an honest-but-curious central server attempts to identify exactly which client owns a given data point which was used in the training phase. Alarmingly, standard gradient obfuscation techniques with Differential Privacy have been shown to be ineffective against SIAs, at least without severely diminishing the accuracy. In this work, we propose a defense against SIAs within the widely studied shuffle model of FL, where an honest shuffler acts as an intermediary between the clients and the server. First, we demonstrate that standard naive shuffling alone is insufficient to prevent SIAs. To effectively defend against SIAs, shuffling needs to be applied at a more granular level; we propose a novel combination of parameter-level shuffling with the residue number system (RNS). Our approach provides robust protection against SIAs without affecting the accuracy of the joint model and can be seamlessly integrated into other privacy protection mechanisms. We conduct experiments on a series of models and datasets, confirming that standard shuffling approaches fail to prevent SIAs and that, in contrast, our proposed method reduce the attack's accuracy to the level of random guessing.
Paper Structure (60 sections, 7 theorems, 29 equations, 15 figures, 12 tables, 9 algorithms)

This paper contains 60 sections, 7 theorems, 29 equations, 15 figures, 12 tables, 9 algorithms.

Key Result

Proposition 3.1

$x \in [-\lfloor \frac{M}{2}\rfloor, \lfloor \frac{M-1}{2}\rfloor ] \cap \mathbb{Z}$ can be losslessly encoded in the RNS, where $M = \prod_i m_i$.

Figures (15)

  • Figure 1: Success rate of SIA for $10$ clients
  • Figure 2: Communication Cost (bits per client per parameter).
  • Figure 3: Model Accuracy (top-1) of Alg. \ref{['alg:param_rns']} with precision $r$ compared to vanilla FL ($10$ clients).
  • Figure 4: Example of \ref{['alg:param_rns']} with moduli $3,5,7$, for a parameter $p$, where $r=2$ and $n=2$.
  • Figure 5: Shuffling methods in FL with different granularity
  • ...and 10 more figures

Theorems & Definitions (14)

  • Definition 1: Source inference attack
  • Definition 2: Unary encoding
  • Definition 3: RNS encoding
  • Proposition 3.1
  • Theorem 1: label=prop:perfect_protection
  • Proposition A.1
  • Proposition A.2
  • proof
  • Proposition A.3
  • proof
  • ...and 4 more