Federated Learning with Differential Privacy
Adrien Banse, Jan Kreischer, Xavier Oliva i Jürgens
TL;DR
This paper investigates privacy-preserving cross-silo federated learning with differential privacy across MNIST, FEMNIST, and a small medical dataset. It benchmarks how the number of clients and DP settings affect convergence and accuracy, highlighting that non-i.i.d and small datasets are especially vulnerable to performance degradation. Using FedAvg with gradient perturbation implemented via PyTorch Opacus, the results show that differential privacy can drop final accuracy from above $95\%$ (no DP) to around $75\%$ for MNIST, while FEMNIST often fails to converge under DP, and the medical dataset exhibits higher variance and limited DP gains. These findings underscore a pronounced privacy-utility trade-off in federated learning and motivate exploring decentralized topologies and alternative privacy-preserving techniques, while acknowledging limitations such as fixed hyperparameters and a focus on small datasets.
Abstract
Federated learning (FL), as a type of distributed machine learning, is capable of significantly preserving client's private data from being shared among different parties. Nevertheless, private information can still be divulged by analyzing uploaded parameter weights from clients. In this report, we showcase our empirical benchmark of the effect of the number of clients and the addition of differential privacy (DP) mechanisms on the performance of the model on different types of data. Our results show that non-i.i.d and small datasets have the highest decrease in performance in a distributed and differentially private setting.
