Private Approximate Query over Horizontal Data Federation

Ala Eddine Laouir; Abdessamad Imine

Private Approximate Query over Horizontal Data Federation

Ala Eddine Laouir, Abdessamad Imine

TL;DR

The paper tackles private, scalable query answering over horizontally partitioned data by marrying Approximate Query Processing with Differential Privacy in a federated setting. It introduces a data distribution–aware cluster sampling framework, offline metadata, and a three-phase protocol (allocation, sampling, approximation) coordinated by an aggregator, with optional lightweight SMC for intermediate results. Empirical evaluation on large, real-world-like datasets shows up to $8\times$ speedups over plain-text execution while maintaining formal privacy and resilience to learning-based attacks. The approach provides a practical pathway to fast, private analytics in cross-organizational settings, and establishes a foundation for extending to more complex queries and tighter DBMS integration.

Abstract

In many real-world scenarios, multiple data providers need to collaboratively perform analysis of their private data. The challenges of these applications, especially at the big data scale, are time and resource efficiency as well as end-to-end privacy with minimal loss of accuracy. Existing approaches rely primarily on cryptography, which improves privacy, but at the expense of query response time. However, current big data analytics frameworks require fast and accurate responses to large-scale queries, making cryptography-based solutions less suitable. In this work, we address the problem of combining Approximate Query Processing (AQP) and Differential Privacy (DP) in a private federated environment answering range queries on horizontally partitioned multidimensional data. We propose a new approach that considers a data distribution-aware online sampling technique to accelerate the execution of range queries and ensure end-to-end data privacy during and after analysis with minimal loss in accuracy. Through empirical evaluation, we show that our solution is able of providing up to 8 times faster processing than the basic non-secure solution while maintaining accuracy, formal privacy guarantees and resilience to learning-based attacks.

Private Approximate Query over Horizontal Data Federation

TL;DR

speedups over plain-text execution while maintaining formal privacy and resilience to learning-based attacks. The approach provides a practical pathway to fast, private analytics in cross-organizational settings, and establishes a foundation for extending to more complex queries and tighter DBMS integration.

Abstract

Paper Structure (28 sections, 7 theorems, 45 equations, 8 figures, 1 table, 3 algorithms)

This paper contains 28 sections, 7 theorems, 45 equations, 8 figures, 1 table, 3 algorithms.

Introduction
Related Works
Preliminaries
Problem Statement
Our solution
Overview
Query Approximation and sampling
Federated protocol
Allocation phase
Sampling phase
Approximation phase
Privacy accounting
Evaluation
Setup
Dimension-based analysis
...and 13 more sections

Key Result

theorem 1

Applying sequentially $M_1, \ldots, M_n$ satisfies $\left( \sum_{j = 1}^{n} \epsilon_j, \sum_{j = 1}^{n} \delta_j \right)$-DP.

Figures (8)

Figure 1: Runtime cost of data sharing in SMC.
Figure 2: Count tensor
Figure 3: Protocol and Architecture
Figure 4: Dimension-based analysis
Figure 5: Sampling rate-based analysis
...and 3 more figures

Theorems & Definitions (7)

theorem 1: Sequential Composition dp
theorem 2: Parallel Composition dp
theorem 3: Post-Processing dp
theorem 4: Sensitivity of estimator $\Delta_{\text{Avg}(\widehat{R}}$
theorem 5: Sensitivity of sampling probability
theorem 6: Sensitivity of estimator $\mathbb{E}$
theorem 7: Dominant distance LS

Private Approximate Query over Horizontal Data Federation

TL;DR

Abstract

Private Approximate Query over Horizontal Data Federation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (7)