Table of Contents
Fetching ...

Fast Private Adaptive Query Answering for Large Data Domains

Miguel Fuentes, Brett Mullins, Yingtai Xiao, Daniel Kifer, Cameron Musco, Daniel Sheldon

TL;DR

The paper tackles privately answering large numbers of marginal queries under differential privacy by leveraging residuals to decouple reconstruction from heavy graphical-model work. It introduces an in-axis multi-dimensional array framework, a lazy updating scheme, and a conditional ResidualPlanner to optimize noise allocation, culminating in AIM+GReM, a fast, scalable adaptive mechanism based on GReM-MLE reconstruction. The results show AIM+GReM is orders of magnitude faster than AIM+PGM while maintaining competitive accuracy, particularly on high-dimensional domains, and it often outperformsResidualPlanner in low-budget regimes. Collectively, these innovations enable accurate private analysis of large tabular datasets with substantially improved efficiency and scalability.

Abstract

Privately releasing marginals of a tabular dataset is a foundational problem in differential privacy. However, state-of-the-art mechanisms suffer from a computational bottleneck when marginal estimates are reconstructed from noisy measurements. Recently, residual queries were introduced and shown to lead to highly efficient reconstruction in the batch query answering setting. We introduce new techniques to integrate residual queries into state-of-the-art adaptive mechanisms such as AIM. Our contributions include a novel conceptual framework for residual queries using multi-dimensional arrays, lazy updating strategies, and adaptive optimization of the per-round privacy budget allocation. Together these contributions reduce error, improve speed, and simplify residual query operations. We integrate these innovations into a new mechanism (AIM+GReM), which improves AIM by using fast residual-based reconstruction instead of a graphical model approach. Our mechanism is orders of magnitude faster than the original framework and demonstrates competitive error and greatly improved scalability.

Fast Private Adaptive Query Answering for Large Data Domains

TL;DR

The paper tackles privately answering large numbers of marginal queries under differential privacy by leveraging residuals to decouple reconstruction from heavy graphical-model work. It introduces an in-axis multi-dimensional array framework, a lazy updating scheme, and a conditional ResidualPlanner to optimize noise allocation, culminating in AIM+GReM, a fast, scalable adaptive mechanism based on GReM-MLE reconstruction. The results show AIM+GReM is orders of magnitude faster than AIM+PGM while maintaining competitive accuracy, particularly on high-dimensional domains, and it often outperformsResidualPlanner in low-budget regimes. Collectively, these innovations enable accurate private analysis of large tabular datasets with substantially improved efficiency and scalability.

Abstract

Privately releasing marginals of a tabular dataset is a foundational problem in differential privacy. However, state-of-the-art mechanisms suffer from a computational bottleneck when marginal estimates are reconstructed from noisy measurements. Recently, residual queries were introduced and shown to lead to highly efficient reconstruction in the batch query answering setting. We introduce new techniques to integrate residual queries into state-of-the-art adaptive mechanisms such as AIM. Our contributions include a novel conceptual framework for residual queries using multi-dimensional arrays, lazy updating strategies, and adaptive optimization of the per-round privacy budget allocation. Together these contributions reduce error, improve speed, and simplify residual query operations. We integrate these innovations into a new mechanism (AIM+GReM), which improves AIM by using fast residual-based reconstruction instead of a graphical model approach. Our mechanism is orders of magnitude faster than the original framework and demonstrates competitive error and greatly improved scalability.
Paper Structure (32 sections, 13 theorems, 49 equations, 11 figures, 3 tables, 6 algorithms)

This paper contains 32 sections, 13 theorems, 49 equations, 11 figures, 3 tables, 6 algorithms.

Key Result

Proposition 3.1

grem The function $T_\gamma\left(\mu_\gamma\right) = \left\{\textup{Decomp}\left(\mu_\gamma,\gamma,\tau\right) | \tau\subseteq\gamma\right\}$ is an invertible linear transformation between a marginal $\mu_\gamma$ and a set of residuals $\zeta_\gamma = \{ \zeta_{\tau} \mid \tau \subseteq \gamma \}$.

Figures (11)

  • Figure 1: Example Marginals and Residual.
  • Figure 2: $\textup{Decomp}\left(\mu,\gamma,\tau\right)$
  • Figure 3: Running time in seconds of the $\textup{Decomp}$ and $\textup{Recon}$ operations with our in-axis implementation versus the baseline Kronecker-based approach.
  • Figure 4: Mean $L_1$ error of AIM variants by running time on all 3-way marginals, relative to ResidualPlanner. Results are averaged across five trials. Privacy budgets $\epsilon$ are indicated in the vertical scale, with $\delta = 10^{-9}$. AIM+PGM uses a 50MB model. For 2D plots with absolute units see Appendix \ref{['app:additional_experiments']}
  • Figure 5: Running time in seconds of the $\textup{Decomp}$ and $\textup{Recon}$ operations with our in-axis implementation versus the baseline Kronecker-based approach.
  • ...and 6 more figures

Theorems & Definitions (18)

  • Definition 2.1
  • Proposition 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Proposition 4.2
  • proof
  • Proposition 4.3
  • Proposition 4.4
  • Theorem 5.1
  • Definition B.1
  • ...and 8 more