Table of Contents
Fetching ...

DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule

Tomoya Matsumoto, Shokichi Takakura, Shun Takagi, Satoshi Hasegawa

TL;DR

DPSQL+ is presented, a privacy-preserving SQL library that simultaneously enforces user-level $(varepsilon,\delta)$-DP and the minimum frequency rule and allows substantially more queries under a fixed global privacy budget than prior libraries in the evaluation.

Abstract

SQL is the de facto interface for exploratory data analysis; however, releasing exact query results can expose sensitive information through membership or attribute inference attacks. Differential privacy (DP) provides rigorous privacy guarantees, but in practice, DP alone may not satisfy governance requirements such as the \emph{minimum frequency rule}, which requires each released group (cell) to include contributions from at least $k$ distinct individuals. In this paper, we present \textbf{DPSQL+}, a privacy-preserving SQL library that simultaneously enforces user-level $(\varepsilon,δ)$-DP and the minimum frequency rule. DPSQL+ adopts a modular architecture consisting of: (i) a \emph{Validator} that statically restricts queries to a DP-safe subset of SQL; (ii) an \emph{Accountant} that consistently tracks cumulative privacy loss across multiple queries; and (iii) a \emph{Backend} that interfaces with various database engines, ensuring portability and extensibility. Experiments on the TPC-H benchmark demonstrate that DPSQL+ achieves practical accuracy across a wide range of analytical workloads -- from basic aggregates to quadratic statistics and join operations -- and allows substantially more queries under a fixed global privacy budget than prior libraries in our evaluation.

DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule

TL;DR

DPSQL+ is presented, a privacy-preserving SQL library that simultaneously enforces user-level -DP and the minimum frequency rule and allows substantially more queries under a fixed global privacy budget than prior libraries in the evaluation.

Abstract

SQL is the de facto interface for exploratory data analysis; however, releasing exact query results can expose sensitive information through membership or attribute inference attacks. Differential privacy (DP) provides rigorous privacy guarantees, but in practice, DP alone may not satisfy governance requirements such as the \emph{minimum frequency rule}, which requires each released group (cell) to include contributions from at least distinct individuals. In this paper, we present \textbf{DPSQL+}, a privacy-preserving SQL library that simultaneously enforces user-level -DP and the minimum frequency rule. DPSQL+ adopts a modular architecture consisting of: (i) a \emph{Validator} that statically restricts queries to a DP-safe subset of SQL; (ii) an \emph{Accountant} that consistently tracks cumulative privacy loss across multiple queries; and (iii) a \emph{Backend} that interfaces with various database engines, ensuring portability and extensibility. Experiments on the TPC-H benchmark demonstrate that DPSQL+ achieves practical accuracy across a wide range of analytical workloads -- from basic aggregates to quadratic statistics and join operations -- and allows substantially more queries under a fixed global privacy budget than prior libraries in our evaluation.
Paper Structure (29 sections, 5 equations, 3 figures, 2 tables)

This paper contains 29 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The architecture of DPSQL+.
  • Figure 2: Mean Relative Error (%) against the ground truth as a function of the privacy parameter $(\varepsilon, 10^{-7})$. The y-axis is shown on a logarithmic scale. Qrlew does not support COUNT_DISTINCT, and SmartNoise SQL does not support COVAR, JOIN and GROUPBY_JOIN. For $\varepsilon = 0.1$, GROUPBY_JOIN is excluded because $\tau$-thresholding removes keys.
  • Figure 3: Maximum number of queries executable with fixed per-query budget $(\varepsilon, \delta)=(0.1, 10^{-7})$ under two global budgets.

Theorems & Definitions (2)

  • Definition 1: Differential privacy Dwork2014Algorithmic
  • Definition 2: Minimum frequency rule (Threshold rule) Sukasih2012ImplementingGarfinkel2023De-Identifying