Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries

Tudor Cebere; David Erb; Damien Desfontaines; Aurélien Bellet; Jack Fitzsimons

Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries

Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, Jack Fitzsimons

TL;DR

This work tackles the mismatch between differential privacy theory and practical implementations by introducing Re:cord-play, a gray-box auditing framework that deterministically verifies data-independence in code and detects data-dependent control flow and sensitivity violations. It extends to Re:cord-play-sample for untrusted primitives, enabling statistical evaluation of individual components via Privacy Loss Distributions and end-to-end composition. Through auditing 12 open-source libraries, the framework uncovers 13 privacy-related bugs and several robustness issues, demonstrating that many real-world DP systems suffer from subtle but impactful flaws beyond their primitives. The authors provide an open-source Python package to integrate privacy testing into CI/CD, offering a practical, scalable tool to improve the reliability of privacy-preserving software and encourage responsible disclosure and adoption across the DP ecosystem.

Abstract

Differential privacy (DP) implementations are notoriously prone to errors, with subtle bugs frequently invalidating theoretical guarantees. Existing verification methods are often impractical: formal tools are too restrictive, while black-box statistical auditing is intractable for complex pipelines and fails to pinpoint the source of the bug. This paper introduces Re:cord-play, a gray-box auditing paradigm that inspects the internal state of DP algorithms. By running an instrumented algorithm on neighboring datasets with identical randomness, Re:cord-play directly checks for data-dependent control flow and provides concrete falsification of sensitivity violations by comparing declared sensitivity against the empirically measured distance between internal inputs. We generalize this to Re:cord-play-sample, a full statistical audit that isolates and tests each component, including untrusted ones. We show that our novel testing approach is both effective and necessary by auditing 12 open-source libraries, including SmartNoise SDK, Opacus, and Diffprivlib, and uncovering 13 privacy violations that impact their theoretical guarantees. We release our framework as an open-source Python package, thereby making it easy for DP developers to integrate effective, computationally inexpensive, and seamless privacy testing as part of their software development lifecycle.

Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries

TL;DR

Abstract

Paper Structure (75 sections, 5 equations, 8 figures, 3 tables, 3 algorithms)

This paper contains 75 sections, 5 equations, 8 figures, 3 tables, 3 algorithms.

Introduction
Contribution
Background & Related Work
Differential Privacy
Membership Inference Attacks and Auditing
Related Work
Distributional Auditing.
Optimization and Counterexample Search.
Proofs of Privacy.
Sensitivity-tracking and domain-specific languages (DSLs)
Formal Verification.
Framework
Motivation & Insights
A Structural Decomposition Approach
Re:cord-play
...and 60 more sections

Figures (8)

Figure 1: Summary of Re:cord-play highlighting the Record Phase (Dataset $D$) and Replay Phase (Dataset $D'$).
Figure 2: End-to-end testing example of a code with a sensitivity miscalibration.
Figure 3: Simplified code for the Sensitivity Bug in the SmartNoise SDK Covariance estimation. The declared sensitivity self.sens is calculated assuming censored data, but the covar function receives the original, uncensored data.
Figure 4: SmartNoise SQL missing $\log(1/\delta)$ factor in the homogeneous odometer.
Figure 5: Simplified code for the two bugs in Synthcity's DP Bayes algorithm. First, the output of the Exponential Mechanism, candidate_idx, is used to index into a private list, and the result is used in a non-private if check. Also, self.K (e.g., the number of parents) equals n_features, and the noise scale becomes zero, completely disabling privacy protection.
...and 3 more figures

Theorems & Definitions (1)

definition 1

Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries

TL;DR

Abstract

Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (1)