Table of Contents
Fetching ...

Accurate and Scalable Matrix Mechanisms via Divide and Conquer

Guanlin He, Yingtai Xiao, Jiamu Bai, Xin Gu, Zeyu Ding, Wenpeng Yin, Daniel Kifer

Abstract

Matrix mechanisms are often used to provide unbiased differentially private query answers when publishing statistics or creating synthetic data. Recent work has developed matrix mechanisms, such as ResidualPlanner and Weighted Fourier Factorizations, that scale to high dimensional datasets while providing optimality guarantees for workloads such as marginals and circular product queries. They operate by adding noise to a linearly independent set of queries that can compactly represent the desired workloads. In this paper, we present QuerySmasher, an alternative scalable approach based on a divide-and-conquer strategy. Given a workload that can be answered from various data marginals, QuerySmasher splits each query into sub-queries and re-assembles the pieces into mutually orthogonal sub-workloads. These sub-workloads represent small, low-dimensional problems that can be independently and optimally answered by existing low-dimensional matrix mechanisms. QuerySmasher then stitches these solutions together to answer queries in the original workload. We show that QuerySmasher subsumes prior work, like ResidualPlanner (RP), ResidualPlanner+ (RP+), and Weighted Fourier Factorizations (WFF). We prove that it can dominate those approaches, under sum squared error, for all workloads. We also experimentally demonstrate the scalability and accuracy of QuerySmasher.

Accurate and Scalable Matrix Mechanisms via Divide and Conquer

Abstract

Matrix mechanisms are often used to provide unbiased differentially private query answers when publishing statistics or creating synthetic data. Recent work has developed matrix mechanisms, such as ResidualPlanner and Weighted Fourier Factorizations, that scale to high dimensional datasets while providing optimality guarantees for workloads such as marginals and circular product queries. They operate by adding noise to a linearly independent set of queries that can compactly represent the desired workloads. In this paper, we present QuerySmasher, an alternative scalable approach based on a divide-and-conquer strategy. Given a workload that can be answered from various data marginals, QuerySmasher splits each query into sub-queries and re-assembles the pieces into mutually orthogonal sub-workloads. These sub-workloads represent small, low-dimensional problems that can be independently and optimally answered by existing low-dimensional matrix mechanisms. QuerySmasher then stitches these solutions together to answer queries in the original workload. We show that QuerySmasher subsumes prior work, like ResidualPlanner (RP), ResidualPlanner+ (RP+), and Weighted Fourier Factorizations (WFF). We prove that it can dominate those approaches, under sum squared error, for all workloads. We also experimentally demonstrate the scalability and accuracy of QuerySmasher.

Paper Structure

This paper contains 26 sections, 2 theorems, 30 equations, 2 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Let ${\textcolor{black}{$\mathcal{M}$}}_1,\dots, {\textcolor{black}{$\mathcal{M}$}}_k$ be a collection of correlated Gaussian linear mechanisms with respective strategy matrices ${\textcolor{black}{$\mathbf{B}$}}_1,\dots, {\textcolor{black}{$\mathbf{B}$}}_k$ and noise covariance matrices ${\textcolo $\blacktriangleleft$$\blacktriangleleft$

Figures (2)

  • Figure 1: A query answerable by the marginal on ${\textcolor{black}{$\mathcal{A}$}}={\{{\textcolor{black}{$A$}}_1,{\textcolor{black}{$A$}}_2\}}$ with domain(${\textcolor{black}{$A$}}_1$)=$\{$yes, no$\}$ and domain(${\textcolor{black}{$A$}}_2$)=$\{$a,b,c$\}$. The tensor form ${\textcolor{black}{$\mathbf{q}$}}$ (left) and vector form ${\textcolor{black}{$\vec{{\textcolor{black}{$\mathbf{q}$}}}$}}$ (right) are shown. The query asks for the number of records that satisfy (${\textcolor{black}{$A$}}_1=$ yes, ${\textcolor{black}{$A$}}_2=$ b) $\vee$ (${\textcolor{black}{$A$}}_1=$ yes, ${\textcolor{black}{$A$}}_2=$ c) $\vee$ (${\textcolor{black}{$A$}}_1=$ no, ${\textcolor{black}{$A$}}_2=$ c).
  • Figure 2: The decomposition of ${\textcolor{black}{$\mathbf{q}$}}$ from Figure \ref{['fig:decomposition_pre']} into the subqueries ${\textcolor{black}{$\mathbf{q}$}}_{{\textcolor{black}{$\Rightarrow$}}\{\}}, {\textcolor{black}{$\mathbf{q}$}}_{{\textcolor{black}{$\Rightarrow$}}\{{\textcolor{black}{$A$}}_1\}}, {\textcolor{black}{$\mathbf{q}$}}_{{\textcolor{black}{$\Rightarrow$}}\{{\textcolor{black}{$A$}}_2\}},$ and ${\textcolor{black}{$\mathbf{q}$}}_{{\textcolor{black}{$\Rightarrow$}}\{{\textcolor{black}{$A$}}_1, {\textcolor{black}{$A$}}_2\}}$

Theorems & Definitions (7)

  • Definition 1: Approximate DP dworkKMM06:ourdata
  • Definition 2: Gaussian DP fdp
  • Definition 3: Correlated Gaussian Linear Mechanism, Privacy Cost commonmech
  • Theorem 1: commonmech
  • Theorem 2
  • Definition 4
  • Definition 5