Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single Layers

Moritz Kirschte; Sebastian Meiser; Saman Ardalan; Esfandiar Mohammadi

Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single Layers

Moritz Kirschte, Sebastian Meiser, Saman Ardalan, Esfandiar Mohammadi

TL;DR

Distributed DP-Helmet introduces a non-interactive, scalable framework for differentially private learning across many users via blind averaging. Phase I performs local DP training using Softmax-SLP or SVM, Phase II uses secure summation to compute the average, yielding an $(\varepsilon,\delta)$-DP global model; the authors prove DP guarantees and show convergence results for hinge-loss SVM and Softmax-SLP. Empirical results on CIFAR-10/100 and federated EMNIST after SimCLR pretraining demonstrate strong utility at tight privacy budgets, with Softmax-SLP often outperforming SVM and robustness to non-IID data. The work also connects blind averaging to the representer theorem, offering insights toward convergence of broader ERMs and scalability to millions of users with robust privacy protections.

Abstract

In this work, we propose two differentially private, non-interactive, distributed learning algorithms in a framework called Distributed DP-Helmet. Our framework is based on what we coin blind averaging: each user locally learns and noises a model and all users then jointly compute the mean of their models via a secure summation protocol. We provide experimental evidence that blind averaging for SVMs and single Softmax-layer (Softmax-SLP) can have a strong utility-privacy tradeoff: we reach an accuracy of 86% on CIFAR-10 for $\varepsilon$ = 0.4 and 1,000 users, of 44% on CIFAR-100 for $\varepsilon$ = 1.2 and 100 users, and of 39% on federated EMNIST for $\varepsilon$ = 0.4 and 3,400 users, all after a SimCLR-based pretraining. As an ablation, we study the resilience of our approach to a strongly non-IID setting. On the theoretical side, we show that blind averaging preserves differential privacy if the objective function is smooth, Lipschitz, and strongly convex like SVMs. We show that these properties also hold for Softmax-SLP which is often used for last-layer fine-tuning such that for a fixed model size the privacy bound $\varepsilon$ of Softmax-SLP no longer depends on the number of classes. This marks a significant advantage in utility and privacy of Softmax-SLP over SVMs. Furthermore, in the limit blind averaging of hinge-loss SVMs convergences to a centralized learned SVM. The latter result is based on the representer theorem and can be seen as a blueprint for finding convergence for other empirical risk minimizers (ERM) like Softmax-SLP.

Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single Layers

TL;DR

-DP global model; the authors prove DP guarantees and show convergence results for hinge-loss SVM and Softmax-SLP. Empirical results on CIFAR-10/100 and federated EMNIST after SimCLR pretraining demonstrate strong utility at tight privacy budgets, with Softmax-SLP often outperforming SVM and robustness to non-IID data. The work also connects blind averaging to the representer theorem, offering insights toward convergence of broader ERMs and scalability to millions of users with robust privacy protections.

Abstract

= 0.4 and 1,000 users, of 44% on CIFAR-100 for

= 1.2 and 100 users, and of 39% on federated EMNIST for

= 0.4 and 3,400 users, all after a SimCLR-based pretraining. As an ablation, we study the resilience of our approach to a strongly non-IID setting. On the theoretical side, we show that blind averaging preserves differential privacy if the objective function is smooth, Lipschitz, and strongly convex like SVMs. We show that these properties also hold for Softmax-SLP which is often used for last-layer fine-tuning such that for a fixed model size the privacy bound

of Softmax-SLP no longer depends on the number of classes. This marks a significant advantage in utility and privacy of Softmax-SLP over SVMs. Furthermore, in the limit blind averaging of hinge-loss SVMs convergences to a centralized learned SVM. The latter result is based on the representer theorem and can be seen as a blueprint for finding convergence for other empirical risk minimizers (ERM) like Softmax-SLP.

Paper Structure (37 sections, 24 theorems, 51 equations, 10 figures, 2 tables, 2 algorithms)

This paper contains 37 sections, 24 theorems, 51 equations, 10 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Differential Privacy
Secure Summation
Pretraining to boost DP Performance
Dual SVM representation
Configuration
DP ERMs via SGD Training
Phase I: Differentially Private Softmax-SLP
System Design of Distributed DP-Helmet
Security of Distributed DP-Helmet
Phase II: Non-interactive Blind Average
Experimental Results
Experimental Setup
...and 22 more sections

Key Result

Theorem 3.6

Let $s_1, \dots, s_n$ be the $d$-dimensional inputs of the clients $U^{(1)}, \dots, U^{(w)}$. Let $\mathcal{F}$ be the ideal secure summation function: $\mathcal{F}(s_1, \dots, s_n) \coloneqq \sum_{i=1}^w s_i$. If secure authentication encryption schemes and authenticated key agreement protocol exis

Figures (10)

Figure 1: Schematic overview of Distributed DP-Helmet (system design in \ref{['sec:dist_helmet']}). (Phase I) Each user locally trains a model, e.g. our Softmax-SLP (\ref{['alg:dpsoftmax']} in \ref{['sec:softmax']}) or an SVM (\ref{['alg:dpsvm']}), via a learning algorithm $T$, and noises the model once. (Phase II) A single secure summation step results in an averaged and $(\varepsilon,\delta)$-DP model. For hinge-loss SVMs, this blind average converges in the limit (cf. \ref{['sec:noninteractive']}). In our experiments (cf. \ref{['sec:results']}), local data is extracted by the feature extractor SimCLR trained on public data.
Figure 2: Main result (detailed plot: \ref{['fig:all_exps']}). Classification accuracy compared to privacy budget $\varepsilon$ of Distributed DP-Helmet ($\delta = 10^{-5}$, $t=50\,\%$ honest users) on SVM and Softmax-SLP variants (cf. \ref{['sec:dist_helmet']}) and DP-SGD-based federated learning (FL). Only ours is non-interactive. The line of SVM-SGD for CIFAR-100 represents an upper bound for the $100$ user performance. We spread the entire dataset among the users.
Figure 3: Pretraining: Schematic overview. Dashed lines denote data flow during training and solid lines during inference.
Figure 4: Blind averaging works on strongly biased non-IID datasets: Local SVM hyperplanes (solid) and their margins (dotted) on a point cloud $x$ for (a) user 1 and (b) user 2. (c) After averaging (green), they approximate the global SVM (black) trained on the combined point cloud, i.e. both SVMs overlap. (d) The parameters $f$ (normal vectors) of each SVM illustrate the average $\text{avg}\xspace(f) = 0.5\cdot(f_1 + f_2)$. Hyperparameters: $\Lambda=20, R=1, c=5, n^{(1)}=n^{(2)}=250, \mathrm{bs}=25, \mathrm{epochs}=500$.
Figure 5: CIFAR-10 accuracy vs. #users with $50$ data points per user. We set $\varepsilon = 0.6$ and $\delta = 10^{-5}$. FL values are interpolated.
...and 5 more figures

Theorems & Definitions (52)

Definition 3.1: $\approx_{\varepsilon,\delta}$ relation
Definition 3.2: Differential Privacy
Definition 3.3: Randomized Sensitivity
Definition 3.4: Computational $\approx_{\varepsilon,\delta}^c$ Differential Privacy
Definition 3.5: Secure Summation
Theorem 3.6: Secure Aggregation $\pi_{SecAgg}$ in the semi-honest setting exists bell2020secure
Theorem 3.7: Representer theorem, cf. argyriou2009there Lem 3, Thm 8
Definition 3.8: Configuration $\zeta$
Definition 3.9: Strong convexity
Definition 3.10: Lipschitzness
...and 42 more

Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single Layers

TL;DR

Abstract

Distributed DP-Helmet: Scalable Differentially Private Non-interactive Averaging of Single Layers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (52)