Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion

Shuqi Ke; Charlie Hou; Sewoong Oh; Giulia Fanti

Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion

Shuqi Ke, Charlie Hou, Sewoong Oh, Giulia Fanti

TL;DR

This work analyzes the training dynamics of privately fine-tuning pretrained backbones and reveals that DP-FFT can distort backbone representations due to misalignment with a randomly initialized linear head. It introduces a zeroth-order Langevin-diffusion approximation that preserves multi-layer interactions while enabling tractable analysis of DP-SGD, and shows that a DP-LP pre-phase can mitigate early feature distortion via representation alignment. The authors derive convergence bounds for both DP-LP and DP-FFT in a simple 2-layer ReLU setting, and provide a theory-backed budget-allocation framework indicating when to favor LP versus FFT under privacy constraints. Experiments on real datasets validate the theory, illustrate the distortion-and-alignment dynamics, and demonstrate practical privacy-utility trade-offs across architectures and benchmarks. Overall, the work offers principled guidance for designing multi-phase private fine-tuning strategies and highlights a path toward understanding privacy-budget allocation in complex models.

Abstract

We show that differentially private full fine-tuning (DP-FFT) can distort pre-trained backbone features based on both theoretical and empirical results. We identify the cause of the distortion as the misalignment between the pre-trained backbone and the randomly initialized linear head. We prove that a sequential fine-tuning strategy can mitigate the feature distortion: first-linear-probing-then-fine-tuning (DP-LP-FFT). A new approximation scheme allows us to derive approximate upper and lower bounds on the training loss of DP-LP and DP-FFT, in a simple but canonical setting of 2-layer neural networks with ReLU activation. Experiments on real-world datasets and architectures are consistent with our theoretical insights. We also derive new upper bounds for 2-layer linear networks without the approximation. Moreover, our theory suggests a trade-off of privacy budget allocation in multi-phase fine-tuning methods like DP-LP-FFT.

Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion

TL;DR

Abstract

Paper Structure (40 sections, 43 theorems, 185 equations, 10 figures, 2 tables)

This paper contains 40 sections, 43 theorems, 185 equations, 10 figures, 2 tables.

Introduction
Related Work
Continuous modeling of differentially private fine-tuning
Notation.
DP-SGD Dynamics.
Zeroth order approximation
Representation Alignment
Theory
Experiments on Representation Alignment
DP Fine-tuning Convergence Rates
Privacy guarantees
Convergence Rates under the Zeroth-order Approximation
Theory without the zeroth-order approximation (2-layer linear network)
Budget Allocation between DP-LP and DP-FFT
Results under Zeroth-order Approximation
...and 25 more sections

Key Result

Theorem 2.2

Denote the model parameter vector in original Langevin diffusion as $\theta_t$, and its zeroth-order approximated version as $\tilde{\theta}$. For any training time $t>0$ and clipping threshold $C>0$,

Figures (10)

Figure 1: Linear probing (LP) freezes the lower layers and optimizes the last linear layer while full fine-tuning (FFT) optimizes the whole network.
Figure 2: Left: Backbone feature quality evaluated by top-1 kNN accuracy on the downstream task, for ResNet-50, through public pre-training on ImageNet-1K and differentially private fine-tuning on STL-10. Right: Privacy budget trade-off in DP-LP-FFT, predicted in our theory, for WideResNet-16-4 on CIFAR-10 dprandp. For a detailed explanation, refer to
Figure 3: Visualization of \ref{['asp:separable-data']}.
Figure 4: We pre-train (BYOL) a ResNet-50 backbone on ImageNet-1K and DP fine-tune (DP-SGD, $\epsilon=1$) it on STL-10. We qualitatively evaluate the features in the ResNet-50 backbone by visualizing the backbone mappings (penultimate layer outputs) of data points via UMAP mcinnes2020umap. These results suggest that DP-FFT distorts feature quality before improving it, as predicted by \ref{['thm:rand-init-distorts-feature']}.
Figure 5: UMAP of penultimate-layer features on a subset of MNIST (labels {0,3,7}). We run $q$ DP-LP epochs ($q\in\{0,10,20\}$) before 5 epochs of DP-FFT. We visualize the features at the end of non-private pretraining, and the end of DP fine-tuning. We observe that DP-FFT alone (2nd from the left, DP-LP-FFT steps=5) has more feature distortion than when we first run some DP-LP steps (2 rightmost figures).
...and 5 more figures

Theorems & Definitions (83)

Definition 2.1: Langevin diffusion pmlr-v195-ganesh23a
Theorem 2.2: Zeroth order approximation error
Theorem 3.3: Random initialization causes feature distortion
Theorem 3.4: DP-LP first mitigates feature distortion
Corollary 3.5: Non-DP feature distortion
Theorem 4.1: Rényi privacy guarantee pmlr-v195-ganesh23a
Theorem 4.2: Approximate DP-LP loss convergence
Theorem 4.3: Approximate DP-FFT loss convergence
Theorem 4.4: DP-LP loss convergence
Theorem 4.5: DP-FFT loss convergence
...and 73 more

Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion

TL;DR

Abstract

Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (83)