Table of Contents
Fetching ...

Qrlew: Rewriting SQL into Differentially Private SQL

Nicolas Grislain, Paul Roussel, Victoria de Sainte Agathe

TL;DR

Qrlew tackles the challenge of making differential privacy practical for SQL analytics by allowing data practitioners to write standard SQL queries whose results are rewritten into differentially private equivalents. It introduces a Relation-based intermediate representation, range propagation with $k$-Intervals and piecewise-monotonic functions, and a privacy unit definition to track ownership across related tables, all feeding into a two-phase rewriting process that yields a DP compatible query. The rewriting uses rule allocation and application to propagate privacy through DP aggregations with Gaussian noise and tau-thresholding for grouping keys, with privacy accounting via an $\text{RDP}$ accountant, and executes entirely within standard SQL back-ends. Compared with existing DP libraries and systems, Qrlew emphasizes an SQL interface, in-database DP execution, and end-to-end automatic rewriting, reducing integration friction for real-world analytics while identifying current limitations and avenues for future work.

Abstract

This paper introduces Qrlew, an open source library that can parse SQL queries into Relations -- an intermediate representation -- that keeps track of rich data types, value ranges, and row ownership; so that they can easily be rewritten into differentially-private equivalent and turned back into SQL queries for execution in a variety of standard data stores. With Qrlew, a data practitioner can express their data queries in standard SQL; the data owner can run the rewritten query without any technical integration and with strong privacy guarantees on the output; and the query rewriting can be operated by a privacy-expert who must be trusted by the owner, but may belong to a separate organization.

Qrlew: Rewriting SQL into Differentially Private SQL

TL;DR

Qrlew tackles the challenge of making differential privacy practical for SQL analytics by allowing data practitioners to write standard SQL queries whose results are rewritten into differentially private equivalents. It introduces a Relation-based intermediate representation, range propagation with -Intervals and piecewise-monotonic functions, and a privacy unit definition to track ownership across related tables, all feeding into a two-phase rewriting process that yields a DP compatible query. The rewriting uses rule allocation and application to propagate privacy through DP aggregations with Gaussian noise and tau-thresholding for grouping keys, with privacy accounting via an accountant, and executes entirely within standard SQL back-ends. Compared with existing DP libraries and systems, Qrlew emphasizes an SQL interface, in-database DP execution, and end-to-end automatic rewriting, reducing integration friction for real-world analytics while identifying current limitations and avenues for future work.

Abstract

This paper introduces Qrlew, an open source library that can parse SQL queries into Relations -- an intermediate representation -- that keeps track of rich data types, value ranges, and row ownership; so that they can easily be rewritten into differentially-private equivalent and turned back into SQL queries for execution in a variety of standard data stores. With Qrlew, a data practitioner can express their data queries in standard SQL; the data owner can run the rewritten query without any technical integration and with strong privacy guarantees on the output; and the query rewriting can be operated by a privacy-expert who must be trusted by the owner, but may belong to a separate organization.
Paper Structure (17 sections, 4 equations, 5 figures)

This paper contains 17 sections, 4 equations, 5 figures.

Figures (5)

  • Figure 1: The rewriting process occurs in three stages: The data practitioner's query is parsed into a Relation, which is rewritten into a DP equivalent and finally executed by the the data owner which returns the privacy-safe result.
  • Figure 2: Relation (Map) associated to the query: SELECT a, count(abs(10*a+b)) AS x FROM table_1 WHERE b>-0.1 AND a IN (1,2,3) GROUP BY a. The arrows point to the inputs of each Relation. Note the propagation of the data type ranges.
  • Figure 3: Example of privacy unit definition for a database with three tables holding users, orders and items records. Each user is protected individually by designating their ids as PID. Orders are attached to a user through the foreign key: user_id. Items's ownership is defined the same way by specifying the lineage: item -> order -> user.
  • Figure 4: The rewriting process happens in two phases: a rewriting rule allocation phase, where each node in the computation graph gets allocated a rewriting rule (RR) compatible with its input and with the desired output property; and a rule application phase, where each Relation is rewritten according to its allocated RR.
  • Figure 5: The rewriting happens in three steps: Rule Setting when we assign the set of potential rewriting rules to each Relation in a computation graph; Rule Elimination, when only feasible rewriting rules are preserved; and Rule Selection, when an actual allocation is selected.