Table of Contents
Fetching ...

DP-FedPGN: Finding Global Flat Minima for Differentially Private Federated Learning via Penalizing Gradient Norm

Junkang Liu, Yuxuan Tian, Fanhua Shang, Yuanyuan Liu, Hongying Liu, Junchao Zhou, Daorui Ding

TL;DR

The paper tackles privacy-induced degradation in client-level differential privacy federated learning (CL-DPFL), where gradient clipping and DP noise create sharp loss landscapes and poor generalization. It introduces DP-FedPGN, which adds a global gradient-norm penalty to steer optimization toward global flat minima, and an optional DP-FedPGN-LS variant with Laplacian smoothing to further flatten the landscape. The authors provide convergence, sensitivity, and privacy analyses under Rényi DP, and demonstrate substantial empirical gains across CNNs, Vision Transformers, and RoBERTa on six tasks under non-IID data, with faster convergence and improved privacy-utility trade-offs. The approach is practical for large-scale, heterogeneous data in DPFL and offers a principled way to mitigate DP-related degradation while preserving performance.

Abstract

To prevent inference attacks in Federated Learning (FL) and reduce the leakage of sensitive information, Client-level Differentially Private Federated Learning (CL-DPFL) is widely used. However, current CL-DPFL methods usually result in sharper loss landscapes, which leads to a decrease in model generalization after differential privacy protection. By using Sharpness Aware Minimization (SAM), the current popular federated learning methods are to find a local flat minimum value to alleviate this problem. However, the local flatness may not reflect the global flatness in CL-DPFL. Therefore, to address this issue and seek global flat minima of models, we propose a new CL-DPFL algorithm, DP-FedPGN, in which we introduce a global gradient norm penalty to the local loss to find the global flat minimum. Moreover, by using our global gradient norm penalty, we not only find a flatter global minimum but also reduce the locally updated norm, which means that we further reduce the error of gradient clipping. From a theoretical perspective, we analyze how DP-FedPGN mitigates the performance degradation caused by DP. Meanwhile, the proposed DP-FedPGN algorithm eliminates the impact of data heterogeneity and achieves fast convergence. We also use Rényi DP to provide strict privacy guarantees and provide sensitivity analysis for local updates. Finally, we conduct effectiveness tests on both ResNet and Transformer models, and achieve significant improvements in six visual and natural language processing tasks compared to existing state-of-the-art algorithms. The code is available at https://github.com/junkangLiu0/DP-FedPGN

DP-FedPGN: Finding Global Flat Minima for Differentially Private Federated Learning via Penalizing Gradient Norm

TL;DR

The paper tackles privacy-induced degradation in client-level differential privacy federated learning (CL-DPFL), where gradient clipping and DP noise create sharp loss landscapes and poor generalization. It introduces DP-FedPGN, which adds a global gradient-norm penalty to steer optimization toward global flat minima, and an optional DP-FedPGN-LS variant with Laplacian smoothing to further flatten the landscape. The authors provide convergence, sensitivity, and privacy analyses under Rényi DP, and demonstrate substantial empirical gains across CNNs, Vision Transformers, and RoBERTa on six tasks under non-IID data, with faster convergence and improved privacy-utility trade-offs. The approach is practical for large-scale, heterogeneous data in DPFL and offers a principled way to mitigate DP-related degradation while preserving performance.

Abstract

To prevent inference attacks in Federated Learning (FL) and reduce the leakage of sensitive information, Client-level Differentially Private Federated Learning (CL-DPFL) is widely used. However, current CL-DPFL methods usually result in sharper loss landscapes, which leads to a decrease in model generalization after differential privacy protection. By using Sharpness Aware Minimization (SAM), the current popular federated learning methods are to find a local flat minimum value to alleviate this problem. However, the local flatness may not reflect the global flatness in CL-DPFL. Therefore, to address this issue and seek global flat minima of models, we propose a new CL-DPFL algorithm, DP-FedPGN, in which we introduce a global gradient norm penalty to the local loss to find the global flat minimum. Moreover, by using our global gradient norm penalty, we not only find a flatter global minimum but also reduce the locally updated norm, which means that we further reduce the error of gradient clipping. From a theoretical perspective, we analyze how DP-FedPGN mitigates the performance degradation caused by DP. Meanwhile, the proposed DP-FedPGN algorithm eliminates the impact of data heterogeneity and achieves fast convergence. We also use Rényi DP to provide strict privacy guarantees and provide sensitivity analysis for local updates. Finally, we conduct effectiveness tests on both ResNet and Transformer models, and achieve significant improvements in six visual and natural language processing tasks compared to existing state-of-the-art algorithms. The code is available at https://github.com/junkangLiu0/DP-FedPGN

Paper Structure

This paper contains 25 sections, 11 theorems, 65 equations, 8 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumption 1, 3, and 4, if we take $g^0=0$, then DP-FedPGN converges as Here $G_0:=\frac{1}{N} \sum_{i=1}^N\left\|\nabla f_i\left(\tilde{x}^0\right)\right\|^2$.

Figures (8)

  • Figure 1: The global loss surface for DP-FedAvg McMahan2018learning, DP-FedSAM DP_FedSAM, and the proposed DP-FedPGN method with ResNet-18 on CIFAR100 in the case of Dirichlet-0.1.
  • Figure 2: Loss landscapes of global and local models trained by DP-FedSAM and DP-FedPGN with ResNet-18 on CIFAR100 in the case of Dirichlet-0.1. DP-FedPGN global model is flatter than DP-FedSAM global model.
  • Figure 3: Comparison of loss landscapes of DP-FedAvg (up), DP-FedSAM (middle), and DP-FedPGN (down) on CIFAR10 with ResNet-18 in the case of Dirichlet-0.1.
  • Figure 4: This example illustrates the relationship between the gradient norm of loss and the flatness of its landscape. The smaller the gradient norm, the flatter the loss landscape. Our DP-FedPGN has smaller gradient norm than DP-FedSAM and DP-FedAvg. Moreover, the flat minima of DP-FedPGN is closer than others under data heterogeneity settings.
  • Figure 5: Convergence plots on CIFAR10 and CIFAR100 (Dirichlet-0.1 and Dirichlet-0.6) with ResNet-18 and ResNet-10.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Theorem 1: Convergence for non-convex functions
  • Lemma 1
  • Theorem 2
  • Lemma 2
  • Lemma 3
  • proof
  • Lemma 4
  • Lemma 5
  • proof
  • Lemma 6
  • ...and 7 more