Continual Counting with Gradual Privacy Expiration

Joel Daniel Andersson; Monika Henzinger; Rasmus Pagh; Teresa Anna Steiner; Jalaj Upadhyay

Continual Counting with Gradual Privacy Expiration

Joel Daniel Andersson, Monika Henzinger, Rasmus Pagh, Teresa Anna Steiner, Jalaj Upadhyay

TL;DR

This paper studies continual counting under differential privacy with gradual privacy expiration, introducing a flexible expiration function $g$ that models decreasing data sensitivity over time. It develops a pan-private, dyadic-interval based algorithm with four key adaptations to support unbounded streams, achieving an additive error of $O\left(\log T/ε\right)$ for a broad class of $g$, and proves a matching lower bound showing near-tightness. The mechanism runs in amortized $O(1)$ time per update and uses $O(\log T)$ space, with empirical results indicating favorable privacy-utility trade-offs compared to a natural baseline, especially for large delays. The work thus provides tight, scalable, and practical guarantees for continual counting under expiration, with potential impact on private streaming analytics and privacy-preserving federated learning.

Abstract

Differential privacy with gradual expiration models the setting where data items arrive in a stream and at a given time $t$ the privacy loss guaranteed for a data item seen at time $(t-d)$ is $εg(d)$, where $g$ is a monotonically non-decreasing function. We study the fundamental $\textit{continual (binary) counting}$ problem where each data item consists of a bit, and the algorithm needs to output at each time step the sum of all the bits streamed so far. For a stream of length $T$ and privacy $\textit{without}$ expiration continual counting is possible with maximum (over all time steps) additive error $O(\log^2(T)/\varepsilon)$ and the best known lower bound is $Ω(\log(T)/\varepsilon)$; closing this gap is a challenging open problem. We show that the situation is very different for privacy with gradual expiration by giving upper and lower bounds for a large set of expiration functions $g$. Specifically, our algorithm achieves an additive error of $ O(\log(T)/ε)$ for a large set of privacy expiration functions. We also give a lower bound that shows that if $C$ is the additive error of any $ε$-DP algorithm for this problem, then the product of $C$ and the privacy expiration function after $2C$ steps must be $Ω(\log(T)/ε)$. Our algorithm matches this lower bound as its additive error is $O(\log(T)/ε)$, even when $g(2C) = O(1)$. Our empirical evaluation shows that we achieve a slowly growing privacy loss with significantly smaller empirical privacy loss for large values of $d$ than a natural baseline algorithm.

Continual Counting with Gradual Privacy Expiration

TL;DR

This paper studies continual counting under differential privacy with gradual privacy expiration, introducing a flexible expiration function

that models decreasing data sensitivity over time. It develops a pan-private, dyadic-interval based algorithm with four key adaptations to support unbounded streams, achieving an additive error of

for a broad class of

, and proves a matching lower bound showing near-tightness. The mechanism runs in amortized

time per update and uses

space, with empirical results indicating favorable privacy-utility trade-offs compared to a natural baseline, especially for large delays. The work thus provides tight, scalable, and practical guarantees for continual counting under expiration, with potential impact on private streaming analytics and privacy-preserving federated learning.

Abstract

Differential privacy with gradual expiration models the setting where data items arrive in a stream and at a given time

the privacy loss guaranteed for a data item seen at time

, where

is a monotonically non-decreasing function. We study the fundamental

problem where each data item consists of a bit, and the algorithm needs to output at each time step the sum of all the bits streamed so far. For a stream of length

and privacy

expiration continual counting is possible with maximum (over all time steps) additive error

and the best known lower bound is

; closing this gap is a challenging open problem. We show that the situation is very different for privacy with gradual expiration by giving upper and lower bounds for a large set of expiration functions

. Specifically, our algorithm achieves an additive error of

for a large set of privacy expiration functions. We also give a lower bound that shows that if

is the additive error of any

-DP algorithm for this problem, then the product of

and the privacy expiration function after

steps must be

. Our algorithm matches this lower bound as its additive error is

, even when

. Our empirical evaluation shows that we achieve a slowly growing privacy loss with significantly smaller empirical privacy loss for large values of

than a natural baseline algorithm.

Paper Structure (25 sections, 10 theorems, 36 equations, 5 figures, 1 table, 3 algorithms)

This paper contains 25 sections, 10 theorems, 36 equations, 5 figures, 1 table, 3 algorithms.

Introduction
Our Contributions
Technical Overview
Preliminaries
Warmup
A Simple Algorithm with Linear Privacy Expiration
A Binary-Tree-Based Algorithm with Logarithmic Privacy Expiration
Accuracy.
Proof of Theorem \ref{['thm:main_upper']} and Corollary \ref{['cor:main']}
Proof of Corollary \ref{['cor:main']}.
Lower Bound on the Privacy Decay
Empirical Evaluation
Conclusion
Acknowledgements
Empirical Evaluation
...and 10 more sections

Key Result

Theorem 1.2

Let $\lambda\in\mathbb{R}_{>0}\backslash\{\tfrac{3}{2}\}$ be a constant, and let parameters $\varepsilon \in \mathbb{R}_{>0}$ and $B\in\mathbb{N}$ be given. There exists an algorithm $\mathcal{A}$ that approximates prefix sums of a (potentially unbounded) input sequence $x_1, x_2, \dots$ with $x_i \ Considering all releases up to and including input $t$, the algorithm $\mathcal{A}$ uses $O(B+\log

Figures (5)

Figure 1: Plots on the privacy loss for our \ref{['alg:privacy_degradation']} and a baseline algorithm.
Figure 2: Worst-case privacy loss computed empirically for a data item streamed $d$ steps earlier.
Figure 3: Worst-case privacy loss for a data item streamed $d$ steps earlier, shown for Algorithm \ref{['alg:privacy_degradation']} (with $\lambda=1, 2, 3$) versus the baseline ($W=127$ and $W=1023$).
Figure 4: Worst-case privacy loss computed empirically for a data item streamed $d$ steps earlier. Figure \ref{['fig:baseline_opt']} is a re-computation of Figure \ref{['fig:baseline']} where the ratio $\varepsilon_{past}/\varepsilon_{cur}$ is set to minimize the maximum privacy loss, yielding a ratio of $0.069$ for $W=31$, $0.08$ for $W=63$ and $0.095$ for $W=127$. Figure \ref{['fig:comparison_baseline_opt']} is a re-computation of Figure \ref{['fig:comparison']} where the ratio $\varepsilon_{past}/\varepsilon_{cur}$ is set to minimize the maximum privacy loss, yielding a ratio of $0.0064$ for $W=127$ and $0.010$ for $W=1023$.
Figure :

Theorems & Definitions (20)

Definition 1.1
Theorem 1.2
Corollary 1.3
Theorem 1.4
Lemma 2.2
Lemma 3.1
Lemma 3.4
Lemma 4.1
Theorem 5.1
Definition B.1: Laplace Distribution
...and 10 more

Continual Counting with Gradual Privacy Expiration

TL;DR

Abstract

Continual Counting with Gradual Privacy Expiration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (20)