Mitigating Data Absence in Federated Learning Using Privacy-Controllable Data Digests

Chih-Fan Hsu; Ming-Ching Chang; Wei-Chao Chen

Mitigating Data Absence in Federated Learning Using Privacy-Controllable Data Digests

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

TL;DR

Evaluation of FedDig across four diverse public datasets shows that it consistently outperforms five baseline algorithms by substantial margins in various data absence scenarios and the FedDig plugin design is inherently extensible and compatible with existing FL algorithms.

Abstract

The absence of training data and their distribution changes in federated learning (FL) can significantly undermine model performance, especially in cross-silo scenarios. To address this challenge, we introduce the Federated Learning with Data Digest (FedDig) framework. FedDig manages unexpected distribution changes using a novel privacy-controllable data digest representation. This framework allows FL users to adjust the protection levels of the digest by manipulating hyperparameters that control the mixing of multiple low-dimensional features and applying differential privacy perturbation to these mixed features. Evaluation of FedDig across four diverse public datasets shows that it consistently outperforms five baseline algorithms by substantial margins in various data absence scenarios. We also thoroughly explored FedDig's hyperparameters, demonstrating its adaptability. Notably, the FedDig plugin design is inherently extensible and compatible with existing FL algorithms.

Mitigating Data Absence in Federated Learning Using Privacy-Controllable Data Digests

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 16 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 3 equations, 16 figures, 2 tables, 1 algorithm.

Introduction
The Impact of Data Absence in Cross-Silo FL
Related Works
The FedDig Framework
The FedDig Training Framework
Generating Privacy-Controllable Data Digests
Data Digest Privacy Assessment
Experimental Results
Impact of Client Leaving and Data Distribution
Impacts on FedDig Hyperparameters
Experiments on Heterogeneous Datasets
Guidance visualization
Direct Reverse Engineering on the Digest.
Communication Cost and Training Time
Conclusion

Figures (16)

Figure 1: The FedDig framework overview: (a) For an available client $i$, data digest $D^i$ is generated and stored at the moderator before joining the FL training. The available clients behave like a regular federated learning algorithm, such as FedAvg FedAVg:AISTATS2017. (b) When a client $j$ is absent, the moderator synthesizes the model update $\nabla {\cal \overline{M}}^j_t$ of the client $j$'s data using a recalled model ${\cal \overline{M}}$ and the digest ${\cal D}^j$. This approach addresses training data distribution change due to client absence.
Figure 2: Test accuracy drops caused by client/data absence.
Figure 3: Training at each client and the moderator: (a) The client model ${\cal M}^i$ takes raw data $R^i$ and an encoded feature $r^i$ to produce a label $\tilde{y}^i$. (b) The recall model ${\cal \overline{M}}^j$ takes training guidance $G^j$ and data digest $D^j$ to produce a soft label $\tilde{D}_y^j$. Feature extractors $F_R$ and $F_D$ extract latent features from the raw data (or guidance) and digests, respectively. $C$ is the classifier producing the final classification.
Figure 4: The noise variances of the mixing method with $SpD=4$.
Figure 5: Model structures used in the EMNIST experiment.
...and 11 more figures

Mitigating Data Absence in Federated Learning Using Privacy-Controllable Data Digests

TL;DR

Abstract

Mitigating Data Absence in Federated Learning Using Privacy-Controllable Data Digests

Authors

TL;DR

Abstract

Table of Contents

Figures (16)