Table of Contents
Fetching ...

Parallel Composition for Statistical Privacy

Dennis Breutigam, Rüdiger Reischuk

TL;DR

The paper introduces Statistical Privacy (SP) as a realistic relaxation of Differential Privacy (DP) that accounts for known but non-identical distributions of database entries. It develops a partition-based subsampling framework, including templates, SAMP operators, and the Sampling Privacy Curve, to derive composition bounds for both nonadaptive and adaptive query sequences under limited background knowledge. The results demonstrate sharper composition bounds than prior distribution-based privacy and show that incorporating distribution entropy can enable more queries with better utility than DP in many scenarios. Practically, SP offers a framework for deploying privacy-preserving analytics when internal data entropy can be harnessed through sampling, reducing the need for heavy external noise. Overall, SP provides a tractable, entropy-aware approach to multi-query privacy with broad applicability to non-identically distributed data and varied background knowledge.

Abstract

Differential Privacy (DP) considers a scenario in which an adversary has almost complete information about the entries of a database. This worst-case assumption is likely to overestimate the privacy threat faced by an individual in practice. In contrast, Statistical Privacy (SP), as well as related notions such as noiseless privacy or limited background knowledge privacy, describe a setting in which the adversary knows the distribution of the database entries, but not their exact realizations. In this case, privacy analysis must account for the interaction between uncertainty induced by the entropy of the underlying distributions and privacy mechanisms that distort query answers, which can be highly non-trivial. This paper investigates this problem for multiple queries (composition). A privacy mechanism is proposed that is based on subsampling and randomly partitioning the database to bound the dependency among queries. This way for the first time, to the best of our knowledge, upper privacy bounds against limited adversaries are obtained without any further restriction on the database. These bounds show that in realistic application scenarios taking the entropy of distributions into account yields improvements of privacy and precision guarantees. We illustrate examples where for fixed privacy parameters and utility loss SP allows significantly more queries than DP.

Parallel Composition for Statistical Privacy

TL;DR

The paper introduces Statistical Privacy (SP) as a realistic relaxation of Differential Privacy (DP) that accounts for known but non-identical distributions of database entries. It develops a partition-based subsampling framework, including templates, SAMP operators, and the Sampling Privacy Curve, to derive composition bounds for both nonadaptive and adaptive query sequences under limited background knowledge. The results demonstrate sharper composition bounds than prior distribution-based privacy and show that incorporating distribution entropy can enable more queries with better utility than DP in many scenarios. Practically, SP offers a framework for deploying privacy-preserving analytics when internal data entropy can be harnessed through sampling, reducing the need for heavy external noise. Overall, SP provides a tractable, entropy-aware approach to multi-query privacy with broad applicability to non-identically distributed data and varied background knowledge.

Abstract

Differential Privacy (DP) considers a scenario in which an adversary has almost complete information about the entries of a database. This worst-case assumption is likely to overestimate the privacy threat faced by an individual in practice. In contrast, Statistical Privacy (SP), as well as related notions such as noiseless privacy or limited background knowledge privacy, describe a setting in which the adversary knows the distribution of the database entries, but not their exact realizations. In this case, privacy analysis must account for the interaction between uncertainty induced by the entropy of the underlying distributions and privacy mechanisms that distort query answers, which can be highly non-trivial. This paper investigates this problem for multiple queries (composition). A privacy mechanism is proposed that is based on subsampling and randomly partitioning the database to bound the dependency among queries. This way for the first time, to the best of our knowledge, upper privacy bounds against limited adversaries are obtained without any further restriction on the database. These bounds show that in realistic application scenarios taking the entropy of distributions into account yields improvements of privacy and precision guarantees. We illustrate examples where for fixed privacy parameters and utility loss SP allows significantly more queries than DP.
Paper Structure (10 sections, 7 theorems, 12 equations, 2 tables)

This paper contains 10 sections, 7 theorems, 12 equations, 2 tables.

Key Result

Lemma 3.1

For a a database distribution $\mu$, a query $F$, a sampling technique $\mathcal{T}$ and $S \subseteq A$ it holds

Theorems & Definitions (16)

  • Definition 2.1: Differential Privacy (DP)
  • Definition 2.2: Statistical Privacy (SP)
  • Definition 2.3
  • Definition 3.1: Templates
  • Definition 3.2: Sampling Technique
  • Lemma 3.1
  • Definition 3.3: Sampling Privacy Curve ( SPC)
  • Definition 3.4: Random Partition
  • Example 1
  • Lemma 5.1
  • ...and 6 more