Table of Contents
Fetching ...

Private Continual Counting of Unbounded Streams

Ben Jacobsen, Kassem Fawaz

TL;DR

This work tackles differentially private continual counting on unbounded data streams by introducing a logarithmically perturbed LTToep matrix factorization that exactly represents the all-ones counting matrix M_count as LR. The authors prove joint validity, bounded sensitivity, and near-optimal asymptotic error for the resulting unbounded streaming algorithm, and provide an efficient, implementable procedure with O(t) space and amortized O(log t) time per update. They also offer practical extensions, including parameter choices, handling imperfect knowledge of n, and hybrid mechanisms that improve constants relative to prior baselines. Empirically, the proposed log_matrix method achieves competitive variance with favorable constants up to n ≈ 2^{24}, while maintaining smooth, unbounded privacy guarantees. This work thus delivers a principled, scalable DP solution for continual counting without requiring known input size in advance.

Abstract

We study the problem of differentially private continual counting in the unbounded setting where the input size $n$ is not known in advance. Current state-of-the-art algorithms based on optimal instantiations of the matrix mechanism cannot be directly applied here because their privacy guarantees only hold when key parameters are tuned to $n$. Using the common `doubling trick' avoids knowledge of $n$ but leads to suboptimal and non-smooth error. We solve this problem by introducing novel matrix factorizations based on logarithmic perturbations of the function $\frac{1}{\sqrt{1-z}}$ studied in prior works, which may be of independent interest. The resulting algorithm has smooth error, and for any $α> 0$ and $t\leq n$ it is able to privately estimate the sum of the first $t$ data points with $O(\log^{2+2α}(t))$ variance. It requires $O(t)$ space and amortized $O(\log t)$ time per round, compared to $O(\log(n)\log(t))$ variance, $O(n)$ space and $O(n \log n)$ pre-processing time for the nearly-optimal bounded-input algorithm of Henzinger et al. (SODA 2023). Empirically, we find that our algorithm's performance is also comparable to theirs in absolute terms: our variance is less than $1.5\times$ theirs for $t$ as large as $2^{24}$.

Private Continual Counting of Unbounded Streams

TL;DR

This work tackles differentially private continual counting on unbounded data streams by introducing a logarithmically perturbed LTToep matrix factorization that exactly represents the all-ones counting matrix M_count as LR. The authors prove joint validity, bounded sensitivity, and near-optimal asymptotic error for the resulting unbounded streaming algorithm, and provide an efficient, implementable procedure with O(t) space and amortized O(log t) time per update. They also offer practical extensions, including parameter choices, handling imperfect knowledge of n, and hybrid mechanisms that improve constants relative to prior baselines. Empirically, the proposed log_matrix method achieves competitive variance with favorable constants up to n ≈ 2^{24}, while maintaining smooth, unbounded privacy guarantees. This work thus delivers a principled, scalable DP solution for continual counting without requiring known input size in advance.

Abstract

We study the problem of differentially private continual counting in the unbounded setting where the input size is not known in advance. Current state-of-the-art algorithms based on optimal instantiations of the matrix mechanism cannot be directly applied here because their privacy guarantees only hold when key parameters are tuned to . Using the common `doubling trick' avoids knowledge of but leads to suboptimal and non-smooth error. We solve this problem by introducing novel matrix factorizations based on logarithmic perturbations of the function studied in prior works, which may be of independent interest. The resulting algorithm has smooth error, and for any and it is able to privately estimate the sum of the first data points with variance. It requires space and amortized time per round, compared to variance, space and pre-processing time for the nearly-optimal bounded-input algorithm of Henzinger et al. (SODA 2023). Empirically, we find that our algorithm's performance is also comparable to theirs in absolute terms: our variance is less than theirs for as large as .

Paper Structure

This paper contains 23 sections, 3 theorems, 11 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

For all $\alpha > 0$, there exists an infinite lower-triangular Toeplitz matrix factorization $L, R \in \mathbb{R}^{\infty \times \infty}$ with the following properties:

Figures (2)

  • Figure 1: Plots the relative error $|\hat{r}_t - r_t|/r_t$ from \ref{['alg:approx']} as a function of $t$ when $\gamma=-0.51$. The asymptotic expansion converges much more quickly when $\delta=0$.
  • Figure 2: Left: Comparison of the exact variance of different algorithms and parameter choices. Contrast with asymptotics in \ref{['tab:comparison']}. The Hybrid mechanism using \ref{['alg:log_matrix']} for the unbounded component outperforms the variant using independent noise as in chan2011private, but both variants exhibit very erratic performance when $t$ is close to a power of 2. For \ref{['alg:log_matrix']}, the parameters $\delta=-\gamma$ and $\delta=-6\gamma/5$ give similar performance, but the latter is slightly better once $t>2^{20}$, and both outperform $\delta=0$ when $t>2^{10}$. Note that the performance of \ref{['alg:approx']} with $\delta=0$ is visually indistinguishable from that of \ref{['alg:log_matrix']}. Right: Coefficients of the matrices $\mathop{\mathrm{LTToep}}\nolimits(f_L)$ (upper) and $\mathop{\mathrm{LTToep}}\nolimits(f_R)$ (lower) for various choices of $\delta$. The choice $\delta=-6\gamma/5$ produces lines that are as close as possible to that of $M_{\textit{count}} ^{1/2}$ without crossing it, which we hypothesize explains its good performance.

Theorems & Definitions (3)

  • Theorem 1
  • Corollary 1
  • Theorem 2: Paraphrased from flajolet1990singularity Theorem 3B