Nebula: Efficient, Private and Accurate Histogram Estimation
Ali Shahin Shamsabadi, Peter Snyder, Ralph Giles, Aurélien Bellet, Hamed Haddadi
TL;DR
Nebula tackles private distributed histogram estimation under an adversarial setting by combining sampling, thresholding, and dummy data with a secret-sharing protocol that uses two non-colluding untrusted servers. It employs a Verifiable Oblivious PRF for randomness, tau-out-of-N secret sharing for client data, and dummy data injected via a truncated discrete Laplace mechanism to achieve $(\varepsilon,\delta)$-DP without trusted third parties. The approach extends to nested, high-dimensional marginal histograms (Nested-Nebula) and demonstrates strong utility, efficiency, and scalability across real datasets, including Census and Shakespeare. The work provides formal privacy and cryptographic analyses, empirical evaluations, and an open-source implementation, offering a practical path to private, scalable histogram estimation in distributed settings.
Abstract
We present \textit{Nebula}, a system for differentially private histogram estimation on data distributed among clients. \textit{Nebula} allows clients to independently decide whether to participate in the system, and locally encode their data so that an untrusted server only learns data values whose multiplicity exceeds a predefined aggregation threshold, with $(\varepsilon,δ)$ differential privacy guarantees. Compared to existing systems, \textit{Nebula} uniquely achieves: \textit{i)} a strict upper bound on client privacy leakage; \textit{ii)} significantly higher utility than standard local differential privacy systems; and \textit{iii)} no requirement for trusted third-parties, multi-party computation, or trusted hardware. We provide a formal evaluation of \textit{Nebula}'s privacy, utility and efficiency guarantees, along with an empirical assessment on three real-world datasets. On the United States Census dataset, clients can submit their data in just 0.0036 seconds and 0.0016 MB (\textbf{efficient}), under strong $(\varepsilon=1,δ=10^{-8})$ differential privacy guarantees (\textbf{private}), enabling \textit{Nebula}'s untrusted aggregation server to estimate histograms with over 88\% better utility than existing local differential privacy deployments (\textbf{accurate}). Additionally, we describe a variant that allows clients to submit multi-dimensional data, with similar privacy, utility, and performance. Finally, we provide an implementation of \textit{Nebula}.
