Table of Contents
Fetching ...

Privacy-Preserving Federated Learning via Dataset Distillation

ShiMao Xu, Xiaopeng Ke, Xing Su, Shucheng Li, Hao Wu, Sheng Zhong, Fengyuan Xu

TL;DR

This work proposes FLiP, which aims to bring the principle of least privilege (PoLP) to FL training through a local-global dataset distillation design and shows that FLiP strikes a good balance between model accuracy and privacy protection.

Abstract

Federated Learning (FL) allows users to share knowledge instead of raw data to train a model with high accuracy. Unfortunately, during the training, users lose control over the knowledge shared, which causes serious data privacy issues. We hold that users are only willing and need to share the essential knowledge to the training task to obtain the FL model with high accuracy. However, existing efforts cannot help users minimize the shared knowledge according to the user intention in the FL training procedure. This work proposes FLiP, which aims to bring the principle of least privilege (PoLP) to FL training. The key design of FLiP is applying elaborate information reduction on the training data through a local-global dataset distillation design. We measure the privacy performance through attribute inference and membership inference attacks. Extensive experiments show that FLiP strikes a good balance between model accuracy and privacy protection.

Privacy-Preserving Federated Learning via Dataset Distillation

TL;DR

This work proposes FLiP, which aims to bring the principle of least privilege (PoLP) to FL training through a local-global dataset distillation design and shows that FLiP strikes a good balance between model accuracy and privacy protection.

Abstract

Federated Learning (FL) allows users to share knowledge instead of raw data to train a model with high accuracy. Unfortunately, during the training, users lose control over the knowledge shared, which causes serious data privacy issues. We hold that users are only willing and need to share the essential knowledge to the training task to obtain the FL model with high accuracy. However, existing efforts cannot help users minimize the shared knowledge according to the user intention in the FL training procedure. This work proposes FLiP, which aims to bring the principle of least privilege (PoLP) to FL training. The key design of FLiP is applying elaborate information reduction on the training data through a local-global dataset distillation design. We measure the privacy performance through attribute inference and membership inference attacks. Extensive experiments show that FLiP strikes a good balance between model accuracy and privacy protection.

Paper Structure

This paper contains 25 sections, 10 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of PoLP and existing privacy protection solutions. Ideally, the new sample generated by PoLP only contains task-relevant information.
  • Figure 2: The training procedures of vanilla FL and FLiP. In both paradigms, no training is needed on the central server. The biggest difference between our FLiP and the vanilla FL is the carrier of information aggregation, i.e., FLiP performs distilled data aggregation, and the vanilla FL performs parameter aggregation. Compared to the vanilla method, the amount of shared information during the training in FLiP is controllable.
  • Figure 3: Diagram of the local training algorithm in each client during round t and round t+1. The pass 1 represents the line \ref{['alg_line:update_model']}, and the pass 2 represents the lines \ref{['alg_line:update_s']}$\sim$\ref{['alg_line:update_ita']} in Algorithm \ref{['alg:fl_data_train']}.
  • Figure 4: Examples of the distilled samples and soft labels in different settings. (soft) L. stands for the (soft) label. The s1 and s2 denote the distilled samples. We report the top 3 elements of the corresponding soft label with confidence.