A Smooth Binary Mechanism for Efficient Private Continual Observation

Joel Daniel Andersson; Rasmus Pagh

A Smooth Binary Mechanism for Efficient Private Continual Observation

Joel Daniel Andersson, Rasmus Pagh

TL;DR

The authors address private continual counting by introducing the Smooth Binary Mechanism, which builds on the classic binary mechanism to deliver stable, identical-noise prefix sums with improved variance. By selecting leaves with balanced binary indices and slightly increasing the tree height, the mechanism achieves a per-output variance of $\mathrm{Var}[\mathcal{M}(t)] = \frac{1+o(1)}{8\rho} \log^2 T$ while using only $O(\log T)$ space and $O(T)$ time, and ensuring constant average work per output. Although it does not match the lowest possible variance of matrix-based factorization methods, it offers a practical, scalable alternative with uniform error distribution across time steps, making it attractive for large-scale, real-time private counting. The work demonstrates favorable trade-offs between statistical accuracy and computational efficiency and provides detailed analyses and empirical comparisons, highlighting broad applicability to private SGD and other continual-privacy tasks. The framework can extend to $\varepsilon$-DP and multidimensional data, with potential future improvements in variance and scalability.

Abstract

In privacy under continual observation we study how to release differentially private estimates based on a dataset that evolves over time. The problem of releasing private prefix sums of $x_1,x_2,x_3,\dots \in\{0,1\}$ (where the value of each $x_i$ is to be private) is particularly well-studied, and a generalized form is used in state-of-the-art methods for private stochastic gradient descent (SGD). The seminal binary mechanism privately releases the first $t$ prefix sums with noise of variance polylogarithmic in $t$. Recently, Henzinger et al. and Denisov et al. showed that it is possible to improve on the binary mechanism in two ways: The variance of the noise can be reduced by a (large) constant factor, and also made more even across time steps. However, their algorithms for generating the noise distribution are not as efficient as one would like in terms of computation time and (in particular) space. We address the efficiency problem by presenting a simple alternative to the binary mechanism in which 1) generating the noise takes constant average time per value, 2) the variance is reduced by a factor about 4 compared to the binary mechanism, and 3) the noise distribution at each step is identical. Empirically, a simple Python implementation of our approach outperforms the running time of the approach of Henzinger et al., as well as an attempt to improve their algorithm using high-performance algorithms for multiplication with Toeplitz matrices.

A Smooth Binary Mechanism for Efficient Private Continual Observation

TL;DR

while using only

space and

time, and ensuring constant average work per output. Although it does not match the lowest possible variance of matrix-based factorization methods, it offers a practical, scalable alternative with uniform error distribution across time steps, making it attractive for large-scale, real-time private counting. The work demonstrates favorable trade-offs between statistical accuracy and computational efficiency and provides detailed analyses and empirical comparisons, highlighting broad applicability to private SGD and other continual-privacy tasks. The framework can extend to

-DP and multidimensional data, with potential future improvements in variance and scalability.

Abstract

In privacy under continual observation we study how to release differentially private estimates based on a dataset that evolves over time. The problem of releasing private prefix sums of

(where the value of each

is to be private) is particularly well-studied, and a generalized form is used in state-of-the-art methods for private stochastic gradient descent (SGD). The seminal binary mechanism privately releases the first

prefix sums with noise of variance polylogarithmic in

. Recently, Henzinger et al. and Denisov et al. showed that it is possible to improve on the binary mechanism in two ways: The variance of the noise can be reduced by a (large) constant factor, and also made more even across time steps. However, their algorithms for generating the noise distribution are not as efficient as one would like in terms of computation time and (in particular) space. We address the efficiency problem by presenting a simple alternative to the binary mechanism in which 1) generating the noise takes constant average time per value, 2) the variance is reduced by a factor about 4 compared to the binary mechanism, and 3) the noise distribution at each step is identical. Empirically, a simple Python implementation of our approach outperforms the running time of the approach of Henzinger et al., as well as an attempt to improve their algorithm using high-performance algorithms for multiplication with Toeplitz matrices.

Paper Structure (29 sections, 8 theorems, 9 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 29 sections, 8 theorems, 9 equations, 5 figures, 1 table, 2 algorithms.

Introduction
Our contributions
Sketch of technical ideas.
Limitations.
Related work
Preliminaries
Binary representation of numbers.
Partial sums (p-sums).
Continual observation of bit stream
Differential privacy
Differentially private continual counting
Binary mechanism.
Factorization mechanism.
Extension to multidimensional input.
Our mechanism
...and 14 more sections

Key Result

Theorem 1.1

For any $\rho > 0$, $T > 1$, there is an efficient $\rho$-zCDP continual counting mechanism $\mathcal{M}$, that on receiving a binary stream of length $T$ satisfies where $\mathcal{M}(t)$ is the output prefix sum at time $t$, while only requiring $O(\log T)$ space, $O(T)$ time to output all $T$ prefix sums, and where the error is identically distributed for all $1\leq t \leq T$.

Figures (5)

Figure 1: Binary trees for a sequence of length $T=7$. In \ref{['fig:binmech_explained']} each leaf is labeled by $\mathop{\mathrm{bin}}\nolimits(t-1)$, and it illustrates how the prefix sum up to $t=6$ can be computed from $\mathop{\mathrm{bin}}\nolimits(t)$. Blue nodes describe the path taken by \ref{['alg:bin_mech']}, and the sum of red nodes form the desired output $\mathcal{M}(6)$.
Figure 2: Computation of $\mathcal{M}(5)$ using the smooth binary mechanism where $T=5$. The figure illustrates how the prefix sum up to $t$ can be computed from $\mathop{\mathrm{bin}}\nolimits(m(t+1))$. Blue nodes describe the path taken by Algorithm 2, and the sum of red nodes form the desired output $\mathcal{M}(t)$. All values shown in nodes are stored noisily, and $\Sigma[i, j]$ is here defined as the sum of leaves $i$ through $j$. Observe that the noise in each node is drawn from the same distribution and that each query is formed by adding together $h/2=2$ noisy nodes, implying an identical distribution of the error at each step.
Figure 3: The least significant bits of two leaf indices in a full binary tree that are neighboring with respect to time when used in the smooth binary mechanism. If the first cluster of 1s in $\mathop{\mathrm{bin}}\nolimits(m(t+1))$, counted from the least significant bit, has $n$ 1s then $n$ nodes in total will be replaced from $t$ to $t+1$.
Figure 4: Comparison of variance between the mechanism in henzinger_almost_2023, the standard binary mechanism and our mechanism. \ref{['fig:running_variance']} shows $\mathop{\mathrm{Var}}\nolimits[\mathcal{M}(t)]$ for $1\leq t\leq T$ for $T=250$, whereas \ref{['fig:max_variance']} shows the maximum variance that each mechanism would attain for a given upper bound on time. At the last time step in \ref{['fig:max_variance']}, our mechanism reduces the variance by a factor of $3.27$ versus the binary mechanism.
Figure 5: Comparison of computational efficiency between the mechanism in henzinger_almost_2023, the binary mechanism and our mechanism. \ref{['fig:time_performance']} shows the computation time spent per $d$-dimensional input. The simulation was run $5$ times for each method, meaning each method has $5$ data points in the plot per time step. The computation was performed for elements of dimension $d=10^4$, was run on a Macbook Pro 2021 with Apple M1 Pro chip and 16 GB memory using Python 3.9.6, scipy version 1.9.2, and numpy version 1.23.3. \ref{['fig:space_performance']} shows the maximum number of floats that has to be stored in memory when outputting all prefix sums up to a given time, assuming binary input.

Theorems & Definitions (14)

Theorem 1.1: Smooth Binary Mechanism
Definition 2.1: Continual Counting Query
Definition 2.2: Counting Mechanism
Definition 2.3: Neighboring Streams
Definition 2.4: Concentrated Differential Privacy (zCDP) BunS16
Lemma 2.5: Gaussian Mechanism BunS16
Proposition 3.1
Proposition 3.2
Theorem 3.3: Exact Variance for Binary Mechanism chan_private_2011dwork_differential_2010
Lemma 3.4
...and 4 more

A Smooth Binary Mechanism for Efficient Private Continual Observation

TL;DR

Abstract

A Smooth Binary Mechanism for Efficient Private Continual Observation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)