Table of Contents
Fetching ...

Secure Federated Data Distillation

Marco Arazzi, Mert Cihangiroglu, Serena Nicolazzo, Antonino Nocera

TL;DR

This work tackles the privacy risks of centralized dataset distillation by introducing Secure Federated Data Distillation (SFDD), which decentralizes the distillation process so raw data remains local. It leverages gradient-matching distillation in a federated loop and adds LDPO-RLD, a LabelDP-based obfuscation strategy, to defend against inference attacks and data leakage from gradient updates. The framework is evaluated on five image datasets, showing distilled data quality comparable to centralized DD and demonstrating robust defense against deep leakage attacks and backdoor threats like Doorping when enough clients participate. These results highlight SFDD as a practical, privacy-preserving approach to collaborative data distillation with broad applicability to sensitive domains such as healthcare.

Abstract

Dataset Distillation (DD) is a powerful technique for reducing large datasets into compact, representative synthetic datasets, accelerating Machine Learning training. However, traditional DD methods operate in a centralized manner, which poses significant privacy threats and reduces its applicability. To mitigate these risks, we propose a Secure Federated Data Distillation (SFDD) framework to decentralize the distillation process while preserving privacy. Unlike existing Federated Distillation techniques that focus on training global models with distilled knowledge, our approach aims to produce a distilled dataset without exposing local contributions. We leverage the gradient-matching-based distillation method, adapting it for a distributed setting where clients contribute to the distillation process without sharing raw data. The central aggregator iteratively refines a synthetic dataset by integrating client-side updates while ensuring data confidentiality. To make our approach resilient to inference attacks perpetrated by the server that could exploit gradient updates to reconstruct private data, we create an optimized Local Differential Privacy approach, called LDPO-RLD. Furthermore, we assess the framework's resilience against malicious clients executing backdoor attacks (such as Doorping) and demonstrate robustness under the assumption of a sufficient number of participating clients. Our experimental results demonstrate the effectiveness of SFDD and that the proposed defense concretely mitigates the identified vulnerabilities, with minimal impact on the performance of the distilled dataset. By addressing the interplay between privacy and federation in dataset distillation, this work advances the field of privacy-preserving Machine Learning making our SFDD framework a viable solution for sensitive data-sharing applications.

Secure Federated Data Distillation

TL;DR

This work tackles the privacy risks of centralized dataset distillation by introducing Secure Federated Data Distillation (SFDD), which decentralizes the distillation process so raw data remains local. It leverages gradient-matching distillation in a federated loop and adds LDPO-RLD, a LabelDP-based obfuscation strategy, to defend against inference attacks and data leakage from gradient updates. The framework is evaluated on five image datasets, showing distilled data quality comparable to centralized DD and demonstrating robust defense against deep leakage attacks and backdoor threats like Doorping when enough clients participate. These results highlight SFDD as a practical, privacy-preserving approach to collaborative data distillation with broad applicability to sensitive domains such as healthcare.

Abstract

Dataset Distillation (DD) is a powerful technique for reducing large datasets into compact, representative synthetic datasets, accelerating Machine Learning training. However, traditional DD methods operate in a centralized manner, which poses significant privacy threats and reduces its applicability. To mitigate these risks, we propose a Secure Federated Data Distillation (SFDD) framework to decentralize the distillation process while preserving privacy. Unlike existing Federated Distillation techniques that focus on training global models with distilled knowledge, our approach aims to produce a distilled dataset without exposing local contributions. We leverage the gradient-matching-based distillation method, adapting it for a distributed setting where clients contribute to the distillation process without sharing raw data. The central aggregator iteratively refines a synthetic dataset by integrating client-side updates while ensuring data confidentiality. To make our approach resilient to inference attacks perpetrated by the server that could exploit gradient updates to reconstruct private data, we create an optimized Local Differential Privacy approach, called LDPO-RLD. Furthermore, we assess the framework's resilience against malicious clients executing backdoor attacks (such as Doorping) and demonstrate robustness under the assumption of a sufficient number of participating clients. Our experimental results demonstrate the effectiveness of SFDD and that the proposed defense concretely mitigates the identified vulnerabilities, with minimal impact on the performance of the distilled dataset. By addressing the interplay between privacy and federation in dataset distillation, this work advances the field of privacy-preserving Machine Learning making our SFDD framework a viable solution for sensitive data-sharing applications.

Paper Structure

This paper contains 13 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The Federated Learning workflow
  • Figure 2: The three categories of FL divided for feature and sample spaces
  • Figure 3: Dataset Distillation (DD) scheme
  • Figure 4: Secure Federated Data Distillation (SFDD) Architecture
  • Figure 5: SFDD performance with different numbers of clients
  • ...and 1 more figures