Table of Contents
Fetching ...

PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential Privacy

Leixia Wang, Qingqing Ye, Haibo Hu, Xiaofeng Meng

TL;DR

PriPL-Tree tackles range queries under Local Differential Privacy by replacing uniform-domain assumptions with a piecewise linear model of the data distribution. The method combines a private PL fitting phase, a PriPL-Tree construction phase, and a refinement phase to produce accurate, low-noise frequency estimates, then extends to multi-dimensional queries using data-aware adaptive grids. The key contributions are the PriPL-Tree itself, which reduces both non-uniform distribution and LDP noise errors, and the adaptive 2-D grid extension with consistency refinement to handle higher dimensions. Extensive experiments on real and synthetic data show substantial accuracy improvements over state-of-the-art methods, especially for non-uniform distributions, validating the practical impact of modeling distributions with PL segments under LDP.

Abstract

Answering range queries in the context of Local Differential Privacy (LDP) is a widely studied problem in Online Analytical Processing (OLAP). Existing LDP solutions all assume a uniform data distribution within each domain partition, which may not align with real-world scenarios where data distribution is varied, resulting in inaccurate estimates. To address this problem, we introduce PriPL-Tree, a novel data structure that combines hierarchical tree structures with piecewise linear (PL) functions to answer range queries for arbitrary distributions. PriPL-Tree precisely models the underlying data distribution with a few line segments, leading to more accurate results for range queries. Furthermore, we extend it to multi-dimensional cases with novel data-aware adaptive grids. These grids leverage the insights from marginal distributions obtained through PriPL-Trees to partition the grids adaptively, adapting the density of underlying distributions. Our extensive experiments on both real and synthetic datasets demonstrate the effectiveness and superiority of PriPL-Tree over state-of-the-art solutions in answering range queries across arbitrary data distributions.

PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential Privacy

TL;DR

PriPL-Tree tackles range queries under Local Differential Privacy by replacing uniform-domain assumptions with a piecewise linear model of the data distribution. The method combines a private PL fitting phase, a PriPL-Tree construction phase, and a refinement phase to produce accurate, low-noise frequency estimates, then extends to multi-dimensional queries using data-aware adaptive grids. The key contributions are the PriPL-Tree itself, which reduces both non-uniform distribution and LDP noise errors, and the adaptive 2-D grid extension with consistency refinement to handle higher dimensions. Extensive experiments on real and synthetic data show substantial accuracy improvements over state-of-the-art methods, especially for non-uniform distributions, validating the practical impact of modeling distributions with PL segments under LDP.

Abstract

Answering range queries in the context of Local Differential Privacy (LDP) is a widely studied problem in Online Analytical Processing (OLAP). Existing LDP solutions all assume a uniform data distribution within each domain partition, which may not align with real-world scenarios where data distribution is varied, resulting in inaccurate estimates. To address this problem, we introduce PriPL-Tree, a novel data structure that combines hierarchical tree structures with piecewise linear (PL) functions to answer range queries for arbitrary distributions. PriPL-Tree precisely models the underlying data distribution with a few line segments, leading to more accurate results for range queries. Furthermore, we extend it to multi-dimensional cases with novel data-aware adaptive grids. These grids leverage the insights from marginal distributions obtained through PriPL-Trees to partition the grids adaptively, adapting the density of underlying distributions. Our extensive experiments on both real and synthetic datasets demonstrate the effectiveness and superiority of PriPL-Tree over state-of-the-art solutions in answering range queries across arbitrary data distributions.
Paper Structure (51 sections, 3 theorems, 14 equations, 21 figures, 9 tables, 4 algorithms)

This paper contains 51 sections, 3 theorems, 14 equations, 21 figures, 9 tables, 4 algorithms.

Key Result

theorem 1

Given a PriPL-Tree with at most $K$ segments (corresponding to $K$ leaf nodes), the error variance of frequencies after weight averaging in refinement (phase 3) is $O\left(\frac{K\cdot \log K}{(1-\alpha) \cdot (K+1) \cdot N \cdot \epsilon^2}\right)$ for non-leaf nodes and $O\left(\frac{\log K}{(1-\a

Figures (21)

  • Figure 1: An Illustration on Non-uniform Errors
  • Figure 2: An Example of PriPL-Tree and HT
  • Figure 3: Workflow of PriPL-Tree
  • Figure 4: An Example of Tree Construction ($N'= N(1-\alpha)$)
  • Figure 5: Examples of Adaptive Grids (Black solid lines represent partitions inherited from PriPL-Trees; blue solid lines indicate newly added partitions; red dashed lines indicate deleted partitions.)
  • ...and 16 more figures

Theorems & Definitions (4)

  • definition 1: $\epsilon$-Local Differential Privacy ($\epsilon$-LDP) duchi2018minimax
  • theorem 1
  • theorem 2
  • lemma 1