Table of Contents
Fetching ...

Subject Data Auditing via Source Inference Attack in Cross-Silo Federated Learning

Jiaxin Li, Marco Arazzi, Antonino Nocera, Mauro Conti

TL;DR

A Subject-Level Source Inference Attack (SLSIA) is proposed by removing critical constraints that only one client can use a target data point in SIA and imprecise detection of clients utilizing target subject data in SMIA.

Abstract

Source Inference Attack (SIA) in Federated Learning (FL) aims to identify which client used a target data point for local model training. It allows the central server to audit clients' data usage. In cross-silo FL, a client (silo) collects data from multiple subjects (e.g., individuals, writers, or devices), posing a risk of subject information leakage. Subject Membership Inference Attack (SMIA) targets this scenario and attempts to infer whether any client utilizes data points from a target subject in cross-silo FL. However, existing results on SMIA are limited and based on strong assumptions on the attack scenario. Therefore, we propose a Subject-Level Source Inference Attack (SLSIA) by removing critical constraints that only one client can use a target data point in SIA and imprecise detection of clients utilizing target subject data in SMIA. The attacker, positioned on the server side, controls a target data source and aims to detect all clients using data points from the target subject. Our strategy leverages a binary attack classifier to predict whether the embeddings returned by a local model on test data from the target subject include unique patterns that indicate a client trains the model with data from that subject. To achieve this, the attacker locally pre-trains models using data derived from the target subject and then leverages them to build a training set for the binary attack classifier. Our SLSIA significantly outperforms previous methods on three datasets. Specifically, SLSIA achieves a maximum average accuracy of 0.88 over 50 target subjects. Analyzing embedding distribution and input feature distance shows that datasets with sparse subjects are more susceptible to our attack. Finally, we propose to defend our SLSIA using item-level and subject-level differential privacy mechanisms.

Subject Data Auditing via Source Inference Attack in Cross-Silo Federated Learning

TL;DR

A Subject-Level Source Inference Attack (SLSIA) is proposed by removing critical constraints that only one client can use a target data point in SIA and imprecise detection of clients utilizing target subject data in SMIA.

Abstract

Source Inference Attack (SIA) in Federated Learning (FL) aims to identify which client used a target data point for local model training. It allows the central server to audit clients' data usage. In cross-silo FL, a client (silo) collects data from multiple subjects (e.g., individuals, writers, or devices), posing a risk of subject information leakage. Subject Membership Inference Attack (SMIA) targets this scenario and attempts to infer whether any client utilizes data points from a target subject in cross-silo FL. However, existing results on SMIA are limited and based on strong assumptions on the attack scenario. Therefore, we propose a Subject-Level Source Inference Attack (SLSIA) by removing critical constraints that only one client can use a target data point in SIA and imprecise detection of clients utilizing target subject data in SMIA. The attacker, positioned on the server side, controls a target data source and aims to detect all clients using data points from the target subject. Our strategy leverages a binary attack classifier to predict whether the embeddings returned by a local model on test data from the target subject include unique patterns that indicate a client trains the model with data from that subject. To achieve this, the attacker locally pre-trains models using data derived from the target subject and then leverages them to build a training set for the binary attack classifier. Our SLSIA significantly outperforms previous methods on three datasets. Specifically, SLSIA achieves a maximum average accuracy of 0.88 over 50 target subjects. Analyzing embedding distribution and input feature distance shows that datasets with sparse subjects are more susceptible to our attack. Finally, we propose to defend our SLSIA using item-level and subject-level differential privacy mechanisms.
Paper Structure (26 sections, 12 equations, 3 figures, 8 tables)

This paper contains 26 sections, 12 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: The pipeline of our subject-level source inference attack includes three stages: pre-training models, extracting embeddings to train the attack model, and evaluating the local models.
  • Figure 2: Attack accuracy distribution of 50 runs (subjects). The caption of each sub-figure is "attack method name-dataset." In the sub-figure, the x-axis is the accuracy range of one run, and the y-axis is the number of runs obtaining an attack accuracy within the accuracy range.
  • Figure 3: High-dimensional embeddings projected into 2-dimensional space with t-SNE. We obtain the high-dimensional embeddings by evaluating the evaluation data points from the target subject to pre-trained and local models. Among four sets of embeddings, "pretrain embedding (in)," "pretrain embedding (out)," "FL embedding (in)," and "FL embedding (out)" are from target pre-trained, random pre-trained, target local, and random local models separately.