Table of Contents
Fetching ...

Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy

Krishnamurthy Dvijotham, H. Brendan McMahan, Krishna Pillutla, Thomas Steinke, Abhradeep Thakurta

TL;DR

This paper tackles the problem of differentially private continual counting in streaming settings by encoding noise addition as a matrix factorization problem for the all-ones lower-triangular matrix $A$. It introduces Buffered Linear Toeplitz (BLT) matrices, which enable efficient streaming noise generation when the Toeplitz coefficients are generated by low-degree rational functions, notably approximating $1/\sqrt{1-x}$. The authors present two primary approaches: RA-BLT, which uses rational function approximations to achieve near-optimality with polylogarithmic memory, and Opt-BLT, which optimizes BLT parameters via gradient-based methods and can closely match the Toeplitz optimum in practice. They further generalize these ideas by combining BLTs with a generalized binary-tree construction to reach near-optimal performance with $ ilde{O}(\log n)$ space, and provide empirical comparisons showing the practical competitiveness of their methods. The work also develops rigorous bounds and efficient algorithms for computing error metrics and supports direct optimization of BLT parameters, making the proposed mechanisms highly applicable to privacy-preserving streaming ML systems such as private continual learning and DP-FTRL.

Abstract

In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differentially private continual counting are either inefficient in terms of their space usage or add an excessive amount of noise, inducing suboptimal utility. The most practical DP continual counting algorithms add carefully correlated Gaussian noise to the values. The task of choosing the covariance for this noise can be expressed in terms of factoring the lower-triangular matrix of ones (which computes prefix sums). We present two approaches from this class (for different parameter regimes) that achieve near-optimal utility for DP continual counting and only require logarithmic or polylogarithmic space (and time). Our first approach is based on a space-efficient streaming matrix multiplication algorithm for a class of Toeplitz matrices. We show that to instantiate this algorithm for DP continual counting, it is sufficient to find a low-degree rational function that approximates the square root on a circle in the complex plane. We then apply and extend tools from approximation theory to achieve this. We also derive efficient closed-forms for the objective function for arbitrarily many steps, and show direct numerical optimization yields a highly practical solution to the problem. Our second approach combines our first approach with a recursive construction similar to the binary tree mechanism.

Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy

TL;DR

This paper tackles the problem of differentially private continual counting in streaming settings by encoding noise addition as a matrix factorization problem for the all-ones lower-triangular matrix . It introduces Buffered Linear Toeplitz (BLT) matrices, which enable efficient streaming noise generation when the Toeplitz coefficients are generated by low-degree rational functions, notably approximating . The authors present two primary approaches: RA-BLT, which uses rational function approximations to achieve near-optimality with polylogarithmic memory, and Opt-BLT, which optimizes BLT parameters via gradient-based methods and can closely match the Toeplitz optimum in practice. They further generalize these ideas by combining BLTs with a generalized binary-tree construction to reach near-optimal performance with space, and provide empirical comparisons showing the practical competitiveness of their methods. The work also develops rigorous bounds and efficient algorithms for computing error metrics and supports direct optimization of BLT parameters, making the proposed mechanisms highly applicable to privacy-preserving streaming ML systems such as private continual learning and DP-FTRL.

Abstract

In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differentially private continual counting are either inefficient in terms of their space usage or add an excessive amount of noise, inducing suboptimal utility. The most practical DP continual counting algorithms add carefully correlated Gaussian noise to the values. The task of choosing the covariance for this noise can be expressed in terms of factoring the lower-triangular matrix of ones (which computes prefix sums). We present two approaches from this class (for different parameter regimes) that achieve near-optimal utility for DP continual counting and only require logarithmic or polylogarithmic space (and time). Our first approach is based on a space-efficient streaming matrix multiplication algorithm for a class of Toeplitz matrices. We show that to instantiate this algorithm for DP continual counting, it is sufficient to find a low-degree rational function that approximates the square root on a circle in the complex plane. We then apply and extend tools from approximation theory to achieve this. We also derive efficient closed-forms for the objective function for arbitrarily many steps, and show direct numerical optimization yields a highly practical solution to the problem. Our second approach combines our first approach with a recursive construction similar to the binary tree mechanism.
Paper Structure (46 sections, 28 theorems, 224 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 46 sections, 28 theorems, 224 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 1.1

For each integer $n \ge 1$ and error parameter $\mu \in (0,1)$ , there exists a lower triangular Toeplitz matrix factorization $B, C \in \mathbb{R}^{n \times n}$ with the following properties.

Figures (4)

  • Figure 1: Ratio of $\mathop{\mathrm{MaxErr}}\nolimits(B,C)$ of our RA-BLT and Opt-BLT mechanisms for different numbers of steps $n$ and degrees $d$ (which corresponds directly to the space complexity) over that of the optimal Toeplitz mechanism of fichtenberger2022constant. This illustrates that even with modest degree $d$, we obtain very good $\mathop{\mathrm{MaxErr}}\nolimits(B,C)$ even for large numbers of steps $n$. For example, Opt-BLT with $d=5$ is within $1\%$ of optimal for $n=10^7$ (we do not plot Opt-BLT for $d=9$).
  • Figure 2: Comparison of known upper and lower bounds for factorizations $A=BC$ of the all-ones lower triangular matrix $A_{i,j}=\mathbb{I}[i\ge j]$. Note that this includes non-Toeplitz factorizations. This illustrates that there is a small gap between lower triangular Toeplitz factorizations and general factorizations; furthermore this gap is asymptotically constant. Left: Vertical axis is $\mathop{\mathrm{MaxErr}}\nolimits(B,C)$. Right: Vertical axis is $\mathop{\mathrm{MaxErr}}\nolimits(B,C)-\mathsf{OptLTToe}(n)$.
  • Figure 3: (Left column) Comparison of three fixed Opt-BLT mechanisms across a range of $n$ (extending beyond the optimization targets). (Center) Comparing the first 200 Toeplitz coefficients defining the $C$ and $B$ matrices for Opt-BLT and RA-BLT for degree $d=2$, with the Opt-BLT mechanism optimized for $n^* = 100$. (Right) Differences in the generating functions for the Opt-BLT and RA-BLT factorizations of the middle column.
  • Figure 4: Semidefinite programming based lower bounds on optimal performance for various classes of matrices based on the results from theorem \ref{['thm:SDPlower']}

Theorems & Definitions (58)

  • Theorem 1.1: Main Result -- Informal version of \ref{['thm:main-f']}
  • Theorem 1.2: Secondary Result -- Informal version of \ref{['thm:main2-f']}
  • Lemma 2.1
  • proof
  • Proposition 2.2: Optimal lower triangular Toeplitz factorization
  • proof
  • Definition 2.3
  • Corollary 2.4
  • Lemma 3.1
  • proof
  • ...and 48 more