Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries
Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, Jack Fitzsimons
TL;DR
This work tackles the mismatch between differential privacy theory and practical implementations by introducing Re:cord-play, a gray-box auditing framework that deterministically verifies data-independence in code and detects data-dependent control flow and sensitivity violations. It extends to Re:cord-play-sample for untrusted primitives, enabling statistical evaluation of individual components via Privacy Loss Distributions and end-to-end composition. Through auditing 12 open-source libraries, the framework uncovers 13 privacy-related bugs and several robustness issues, demonstrating that many real-world DP systems suffer from subtle but impactful flaws beyond their primitives. The authors provide an open-source Python package to integrate privacy testing into CI/CD, offering a practical, scalable tool to improve the reliability of privacy-preserving software and encourage responsible disclosure and adoption across the DP ecosystem.
Abstract
Differential privacy (DP) implementations are notoriously prone to errors, with subtle bugs frequently invalidating theoretical guarantees. Existing verification methods are often impractical: formal tools are too restrictive, while black-box statistical auditing is intractable for complex pipelines and fails to pinpoint the source of the bug. This paper introduces Re:cord-play, a gray-box auditing paradigm that inspects the internal state of DP algorithms. By running an instrumented algorithm on neighboring datasets with identical randomness, Re:cord-play directly checks for data-dependent control flow and provides concrete falsification of sensitivity violations by comparing declared sensitivity against the empirically measured distance between internal inputs. We generalize this to Re:cord-play-sample, a full statistical audit that isolates and tests each component, including untrusted ones. We show that our novel testing approach is both effective and necessary by auditing 12 open-source libraries, including SmartNoise SDK, Opacus, and Diffprivlib, and uncovering 13 privacy violations that impact their theoretical guarantees. We release our framework as an open-source Python package, thereby making it easy for DP developers to integrate effective, computationally inexpensive, and seamless privacy testing as part of their software development lifecycle.
