Near Exact Privacy Amplification for Matrix Mechanisms

Christopher A. Choquette-Choo; Arun Ganesh; Saminul Haque; Thomas Steinke; Abhradeep Thakurta

Near Exact Privacy Amplification for Matrix Mechanisms

Christopher A. Choquette-Choo, Arun Ganesh, Saminul Haque, Thomas Steinke, Abhradeep Thakurta

TL;DR

This work addresses the challenge of obtaining tight privacy guarantees for differential privacy in training when combining privacy amplification via random batching with correlated noise governed by a matrix $\mathbf{C}$. It introduces near-exact privacy accounting using Monte Carlo methods and a balls-in-bins batching scheme to enable practical amplification for general, non-negative, lower-triangular $\mathbf{C}$, circumventing composition. By formulating an optimization framework over $\mathbf{C}$ (often restricting to Toeplitz forms) and calibrating the noise scale via MC accounting, the approach achieves significant RMSE improvements on prefix-sum tests and practical gains on CIFAR-10, compared to state-of-the-art banded/Poisson-based methods. The results demonstrate that near-exact, amplification-aware optimization of correlated noise can yield tangible utility benefits in DP machine learning while remaining scalable and implementable in modern training pipelines.

Abstract

We study the problem of computing the privacy parameters for DP machine learning when using privacy amplification via random batching and noise correlated across rounds via a correlation matrix $\textbf{C}$ (i.e., the matrix mechanism). Past work on this problem either only applied to banded $\textbf{C}$, or gave loose privacy parameters. In this work, we give a framework for computing near-exact privacy parameters for any lower-triangular, non-negative $\textbf{C}$. Our framework allows us to optimize the correlation matrix $\textbf{C}$ while accounting for amplification, whereas past work could not. Empirically, we show this lets us achieve smaller RMSE on prefix sums than the previous state-of-the-art (SOTA). We also show that we can improve on the SOTA performance on deep learning tasks. Our two main technical tools are (i) using Monte Carlo accounting to bypass composition, which was the main technical challenge for past work, and (ii) a "balls-in-bins" batching scheme that enables easy privacy analysis and is closer to practical random batching than Poisson sampling.

Near Exact Privacy Amplification for Matrix Mechanisms

TL;DR

. It introduces near-exact privacy accounting using Monte Carlo methods and a balls-in-bins batching scheme to enable practical amplification for general, non-negative, lower-triangular

, circumventing composition. By formulating an optimization framework over

(often restricting to Toeplitz forms) and calibrating the noise scale via MC accounting, the approach achieves significant RMSE improvements on prefix-sum tests and practical gains on CIFAR-10, compared to state-of-the-art banded/Poisson-based methods. The results demonstrate that near-exact, amplification-aware optimization of correlated noise can yield tangible utility benefits in DP machine learning while remaining scalable and implementable in modern training pipelines.

Abstract

We study the problem of computing the privacy parameters for DP machine learning when using privacy amplification via random batching and noise correlated across rounds via a correlation matrix

(i.e., the matrix mechanism). Past work on this problem either only applied to banded

, or gave loose privacy parameters. In this work, we give a framework for computing near-exact privacy parameters for any lower-triangular, non-negative

. Our framework allows us to optimize the correlation matrix

while accounting for amplification, whereas past work could not. Empirically, we show this lets us achieve smaller RMSE on prefix sums than the previous state-of-the-art (SOTA). We also show that we can improve on the SOTA performance on deep learning tasks. Our two main technical tools are (i) using Monte Carlo accounting to bypass composition, which was the main technical challenge for past work, and (ii) a "balls-in-bins" batching scheme that enables easy privacy analysis and is closer to practical random batching than Poisson sampling.

Paper Structure (31 sections, 4 theorems, 14 equations, 27 figures, 2 algorithms)

This paper contains 31 sections, 4 theorems, 14 equations, 27 figures, 2 algorithms.

Introduction
Our Contributions
Algorithmic Contributions
Empirical Evaluation
Background and Prior Work
Comparison to Chua et al.
Future Directions
Privacy Loss Distributions and Monte Carlo Accounting
Balls-in-bins batching
Monte Carlo Accounting for Correlated Noise Mechanisms
Calibrating $\sigma$
Optimizing over matrices
Experiments
RMSE Results
Analysis of the amplification schemes
...and 16 more sections

Key Result

Theorem 2.1

Suppose in alg:wrapper that for any $P, Q$ such that $H_\varepsilon(P, Q) > \tau \delta$, $\hat{\delta} > \delta$ w.p. at least $1 - \tau \delta$. Then alg:wrapper is $(\varepsilon, \tau \delta)$-DP.

Figures (27)

Figure 1: A comparison of the how different amplification methods might form batches. Here, we use $n = 6$ rounds, and form $b = 3$ batches per epoch for $E = 2$ epochs (visually represented as one row for each epoch). Note that shuffling uses fixed batch sizes, and all methods but Poisson sampling use the same batching across epochs and enforce exactly one participation per example per epoch.
Figure 2: Comparisons using fixed $\mathbf{C}$.
Figure 3: Comparisons for variable $\mathbf{C}$.
Figure 4: Time (in seconds) to optimize $\mathbf{C}$ for different values of Monte Carlo samples used per gradient descent iteration, number of epochs, and iterations per epoch.
Figure 5: Percentage increase in RMSE due to using a smaller number of samples per iteration in the optimization procedure, when compared to the RMSE achieved by using $2^{20}$ samples per iteration.
...and 22 more figures

Theorems & Definitions (7)

Theorem 2.1: Theorem 9 of wang2023randomized
Theorem 2.2
proof
Definition 3.1
Lemma 3.2: Dimension-reduction for balls-in-bins batching
proof
Lemma 3.3

Near Exact Privacy Amplification for Matrix Mechanisms

TL;DR

Abstract

Near Exact Privacy Amplification for Matrix Mechanisms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (27)

Theorems & Definitions (7)