Table of Contents
Fetching ...

Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing

Eugenio Lomurno, Matteo Matteucci

TL;DR

Federated learning can leak privacy through models and updates in cross-silo settings. The authors propose FedKR, a data-centric FL framework that exchanges only synthetic data generated locally, combined with Dynamic Dataset Aggregation to tailor training to each participant. Through Knowledge Recycling and a sequence of synthetic-data-augmented training steps, FedKR achieves an average improvement of $4.24 ext{ extpercent}$ over local training and demonstrates resilience against key privacy attacks, with notable gains in data-scarce medical contexts. This approach enables privacy-preserving inter-institution collaboration, providing practical benefits for healthcare and other sensitive domains while reducing exposure to attack surfaces inherent in traditional FL.

Abstract

Federated learning has emerged as a paradigm for collaborative learning, enabling the development of robust models without the need to centralise sensitive data. However, conventional federated learning techniques have privacy and security vulnerabilities due to the exposure of models, parameters or updates, which can be exploited as an attack surface. This paper presents Federated Knowledge Recycling (FedKR), a cross-silo federated learning approach that uses locally generated synthetic data to facilitate collaboration between institutions. FedKR combines advanced data generation techniques with a dynamic aggregation process to provide greater security against privacy attacks than existing methods, significantly reducing the attack surface. Experimental results on generic and medical datasets show that FedKR achieves competitive performance, with an average improvement in accuracy of 4.24% compared to training models from local data, demonstrating particular effectiveness in data scarcity scenarios.

Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing

TL;DR

Federated learning can leak privacy through models and updates in cross-silo settings. The authors propose FedKR, a data-centric FL framework that exchanges only synthetic data generated locally, combined with Dynamic Dataset Aggregation to tailor training to each participant. Through Knowledge Recycling and a sequence of synthetic-data-augmented training steps, FedKR achieves an average improvement of over local training and demonstrates resilience against key privacy attacks, with notable gains in data-scarce medical contexts. This approach enables privacy-preserving inter-institution collaboration, providing practical benefits for healthcare and other sensitive domains while reducing exposure to attack surfaces inherent in traditional FL.

Abstract

Federated learning has emerged as a paradigm for collaborative learning, enabling the development of robust models without the need to centralise sensitive data. However, conventional federated learning techniques have privacy and security vulnerabilities due to the exposure of models, parameters or updates, which can be exploited as an attack surface. This paper presents Federated Knowledge Recycling (FedKR), a cross-silo federated learning approach that uses locally generated synthetic data to facilitate collaboration between institutions. FedKR combines advanced data generation techniques with a dynamic aggregation process to provide greater security against privacy attacks than existing methods, significantly reducing the attack surface. Experimental results on generic and medical datasets show that FedKR achieves competitive performance, with an average improvement in accuracy of 4.24% compared to training models from local data, demonstrating particular effectiveness in data scarcity scenarios.
Paper Structure (9 sections, 2 figures, 2 tables)

This paper contains 9 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The difference between an Ordinary Training, Federated Averaging and the Federated Knowledge Recycling technique.
  • Figure 2: The performance achieved by the members of the federation in the identification of the optimal checkpoint of the Generator to be shared.