Table of Contents
Fetching ...

Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training

Huitong Jin, Yipeng Zhou, Quan Z. Sheng, Shiting Wen, Laizhong Cui

TL;DR

This work tackles the privacy-utility trade-off in differentially private federated learning (DPFL) by leveraging pre-trained models. It introduces Pretrain-DPFL, which automatically selects the optimal fine-tuning strategy—head-tuning (HT), full-tuning (FT), or unified-tuning (UT) that combines HT then FT—under DP, without additional privacy cost. Through convergence analysis for smooth non-convex losses, the authors derive conditions to choose between HT and FT and implement UT to adaptively switch strategies, guided by server-side estimates. Empirical results on multiple datasets show substantial improvements over scratch training and competitive baselines, validating the framework's ability to mitigate DP noise and improve the privacy-utility balance in DPFL.

Abstract

Differentially Private Federated Learning (DPFL) strengthens privacy protection by perturbing model gradients with noise, though at the cost of reduced accuracy. Although prior empirical studies indicate that initializing from pre-trained rather than random parameters can alleviate noise disturbance, the problem of optimally fine-tuning pre-trained models in DPFL remains unaddressed. In this paper, we propose Pretrain-DPFL, a framework that systematically evaluates three most representative fine-tuning strategies: full-tuning (FT), head-tuning (HT), and unified-tuning(UT) combining HT followed by FT. Through convergence analysis under smooth non-convex loss, we establish theoretical conditions for identifying the optimal fine-tuning strategy in Pretrain-DPFL, thereby maximizing the benefits of pre-trained models in mitigating noise disturbance. Extensive experiments across multiple datasets demonstrate Pretrain-DPFL's superiority, achieving $25.22\%$ higher accuracy than scratch training and outperforming the second-best baseline by $8.19\%$, significantly improving the privacy-utility trade-off in DPFL.

Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training

TL;DR

This work tackles the privacy-utility trade-off in differentially private federated learning (DPFL) by leveraging pre-trained models. It introduces Pretrain-DPFL, which automatically selects the optimal fine-tuning strategy—head-tuning (HT), full-tuning (FT), or unified-tuning (UT) that combines HT then FT—under DP, without additional privacy cost. Through convergence analysis for smooth non-convex losses, the authors derive conditions to choose between HT and FT and implement UT to adaptively switch strategies, guided by server-side estimates. Empirical results on multiple datasets show substantial improvements over scratch training and competitive baselines, validating the framework's ability to mitigate DP noise and improve the privacy-utility balance in DPFL.

Abstract

Differentially Private Federated Learning (DPFL) strengthens privacy protection by perturbing model gradients with noise, though at the cost of reduced accuracy. Although prior empirical studies indicate that initializing from pre-trained rather than random parameters can alleviate noise disturbance, the problem of optimally fine-tuning pre-trained models in DPFL remains unaddressed. In this paper, we propose Pretrain-DPFL, a framework that systematically evaluates three most representative fine-tuning strategies: full-tuning (FT), head-tuning (HT), and unified-tuning(UT) combining HT followed by FT. Through convergence analysis under smooth non-convex loss, we establish theoretical conditions for identifying the optimal fine-tuning strategy in Pretrain-DPFL, thereby maximizing the benefits of pre-trained models in mitigating noise disturbance. Extensive experiments across multiple datasets demonstrate Pretrain-DPFL's superiority, achieving higher accuracy than scratch training and outperforming the second-best baseline by , significantly improving the privacy-utility trade-off in DPFL.
Paper Structure (11 sections, 6 theorems, 5 equations, 3 figures, 1 algorithm)

This paper contains 11 sections, 6 theorems, 5 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

Laplace Mechanism wei2020federated. For a query with the $l_1$-sensitivity, assuming that gradients are $l_1$-bounded by $\xi_1$, the Laplace mechanism ensures $(\epsilon_i, 0 )$-DP by adding $\mathbf{\tilde{w}}_t^i \sim \text{Lap}\left(0, \frac{2T\xi_1}{d_i \epsilon_i} \mathbb{I}_{n}\right)$, $\mat

Figures (3)

  • Figure 1: Comparison of model accuracy across different privacy budgets using a CNN model, with the number of global training iterations set to $T = 128$.
  • Figure 2: Comparison of model accuracy across different privacy budgets using a ResNet20 model, with the number of global training iterations set to $T = 128$.
  • Figure 3: Comparison of the model accuracy across different total iterations using a CNN model, with the $\epsilon$ set to 30 and 0.5 for the Laplace and Gaussian mechanisms, respectively.

Theorems & Definitions (11)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 1
  • Proposition 1
  • Definition 5
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • ...and 1 more