Table of Contents
Fetching ...

Attribute Inference Attacks for Federated Regression Tasks

Francesco Diana, Othmane Marfoq, Chuan Xu, Giovanni Neglia, Frédéric Giroire, Eoin Thomas

TL;DR

The paper addresses privacy leakage in federated regression by proposing a model-based AIA that first reconstructs a targeted client’s optimal local model and then applies a model-based attribute inference on that model. It provides a theoretical lower bound for AIA accuracy in least squares regression and demonstrates that model-based attacks can surpass gradient-based approaches, especially under data heterogeneity and active adversaries. Experiments on Medical and Income datasets show substantial improvements over state-of-the-art gradient-based AIAs, with DP-SGD offering incomplete protection. The work highlights practical privacy risks in FL regression and motivates the development of stronger defenses that go beyond standard differential privacy alone.

Abstract

Federated Learning (FL) enables multiple clients, such as mobile phones and IoT devices, to collaboratively train a global machine learning model while keeping their data localized. However, recent studies have revealed that the training phase of FL is vulnerable to reconstruction attacks, such as attribute inference attacks (AIA), where adversaries exploit exchanged messages and auxiliary public information to uncover sensitive attributes of targeted clients. While these attacks have been extensively studied in the context of classification tasks, their impact on regression tasks remains largely unexplored. In this paper, we address this gap by proposing novel model-based AIAs specifically designed for regression tasks in FL environments. Our approach considers scenarios where adversaries can either eavesdrop on exchanged messages or directly interfere with the training process. We benchmark our proposed attacks against state-of-the-art methods using real-world datasets. The results demonstrate a significant increase in reconstruction accuracy, particularly in heterogeneous client datasets, a common scenario in FL. The efficacy of our model-based AIAs makes them better candidates for empirically quantifying privacy leakage for federated regression tasks.

Attribute Inference Attacks for Federated Regression Tasks

TL;DR

The paper addresses privacy leakage in federated regression by proposing a model-based AIA that first reconstructs a targeted client’s optimal local model and then applies a model-based attribute inference on that model. It provides a theoretical lower bound for AIA accuracy in least squares regression and demonstrates that model-based attacks can surpass gradient-based approaches, especially under data heterogeneity and active adversaries. Experiments on Medical and Income datasets show substantial improvements over state-of-the-art gradient-based AIAs, with DP-SGD offering incomplete protection. The work highlights practical privacy risks in FL regression and motivates the development of stronger defenses that go beyond standard differential privacy alone.

Abstract

Federated Learning (FL) enables multiple clients, such as mobile phones and IoT devices, to collaboratively train a global machine learning model while keeping their data localized. However, recent studies have revealed that the training phase of FL is vulnerable to reconstruction attacks, such as attribute inference attacks (AIA), where adversaries exploit exchanged messages and auxiliary public information to uncover sensitive attributes of targeted clients. While these attacks have been extensively studied in the context of classification tasks, their impact on regression tasks remains largely unexplored. In this paper, we address this gap by proposing novel model-based AIAs specifically designed for regression tasks in FL environments. Our approach considers scenarios where adversaries can either eavesdrop on exchanged messages or directly interfere with the training process. We benchmark our proposed attacks against state-of-the-art methods using real-world datasets. The results demonstrate a significant increase in reconstruction accuracy, particularly in heterogeneous client datasets, a common scenario in FL. The efficacy of our model-based AIAs makes them better candidates for empirically quantifying privacy leakage for federated regression tasks.

Paper Structure

This paper contains 51 sections, 5 theorems, 32 equations, 6 figures, 3 tables, 5 algorithms.

Key Result

Proposition 1

Let $E_c$ be the mean square error of a given least squares regression model $\theta$ on the local dataset of client $c$ and $\theta[s]$ be the model parameter corresponding to a binary sensitive attribute. The accuracy of the model-based AIA eq:aia_model_based is larger than or equal to $1-\frac{4E

Figures (6)

  • Figure 1: Average performance of different AIAs when four clients train a neural network through FedAvg with 1 local epoch and batch size 32. Each client stores $S_c$ data points randomly sampled from ACS Income dataset income_dataset. The adversary infers the gender attribute of every data sample held by the client given access to the released (public) information.
  • Figure 2: The performance of our passive approach for reconstructing optimal local model (left) and the triggered AIA (right) on a toy dataset with two clients training a linear model with size $d=11$ under batch size 64, 256 and 1024 for 5 seeds each, respectively. The passive adversary only eavesdropped $d+1$ messages.
  • Figure 3: The AIA accuracy over all clients' local datasets under different heterogeneity levels (left) ($0\%$ represents i.i.d case), batch sizes (center), and local epochs (right) for Income-L dataset. The default values for heterogeneity level, batch size $B$ and local epochs $E$ are set to $40\%$, $32$, and $1$, respectively. The malicious adversary attacks $\lceil 50/E \rceil$ rounds after $\lceil 100/E \rceil$ communication rounds. Crosses represent passive attacks, while dots represent active attacks. Dashed lines correspond to gradient-based attacks (Grad), and solid lines correspond to model-based attacks (Ours and Model-w-O).
  • Figure 4: AIA accuracy in Income-A dataset on clients with different local dataset size $S_c$. The experiment setting is the same as in Table \ref{['tab:attacks_32']} with 50 active rounds.
  • Figure 5: The AIA accuracy over all clients' local datasets under different starting points of active attack for Income-L dataset ($40\%$ heterogeneity level). The clients train a neural network through FedAvg with 1 local epoch and batch size 32.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Proposition 1
  • proof
  • Theorem 1: Informal statement
  • Proposition 2
  • Proposition 3
  • proof
  • Lemma 2
  • proof
  • proof
  • proof