Table of Contents
Fetching ...

Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

Chenxi Qiu

TL;DR

LP-based Metric Differential Privacy (mDP) scales poorly due to the large number of variables and constraints. The paper introduces a computation framework that partitions the secret dataset via an mDP graph and applies a two-stage Benders decomposition (master program and subproblems) to solve the resulting PMO efficiently. The key contributions are a Distance-Vector based secret dataset partitioning method that preserves strong intra-subset mDP constraints while balancing subset sizes, a block-ladder PMO reformulation amenable to BD, and comprehensive experiments across geo-location, text, and synthetic data demonstrating up to $9\times$ scalability gains. This approach enables near-optimal mDP perturbation for large-scale metric spaces, with practical impact on privacy-preserving data publishing in real-world applications.

Abstract

Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency.

Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

TL;DR

LP-based Metric Differential Privacy (mDP) scales poorly due to the large number of variables and constraints. The paper introduces a computation framework that partitions the secret dataset via an mDP graph and applies a two-stage Benders decomposition (master program and subproblems) to solve the resulting PMO efficiently. The key contributions are a Distance-Vector based secret dataset partitioning method that preserves strong intra-subset mDP constraints while balancing subset sizes, a block-ladder PMO reformulation amenable to BD, and comprehensive experiments across geo-location, text, and synthetic data demonstrating up to scalability gains. This approach enables near-optimal mDP perturbation for large-scale metric spaces, with practical impact on privacy-preserving data publishing in real-world applications.

Abstract

Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency.
Paper Structure (31 sections, 1 theorem, 24 equations, 14 figures, 5 tables)

This paper contains 31 sections, 1 theorem, 24 equations, 14 figures, 5 tables.

Key Result

Proposition 3.3

(Upper and lower bounds of PMO's optimal) Rahmaniani-EJOR2017 (1) The optimal solution of the MP (Equ. (eq:MPObj) -- (eq:MPzy0)) offers a lower bound of the optimal solution of the original PMO (Equ. (eq:LPobjective)--(eq:LPconstraint1)) (as the MP relaxes the constraints of PMO). (2) The solution o

Figures (14)

  • Figure 1: Comparison of the secret dataset size in the related LP-based mDP works and our work. CCS 2012 Shokri-CCS2012, CCS 2014 Fawaz-CCS2014, ICDM 2016 Wang-CIDM2016, WWW 2017 Wang-WWW2017, NDSS 2017 Yu-NDSS2017, ICDCS 2019 Qiu-ICDCS2019, CIKM Qiu-CIKM2020, TMC 2022 Qiu-TMC2022, SIGSPATIAL 2022 Qiu-SIGSPATIAL2022, UAI 2022 ImolaUAI2022, EDBT 2023 Pappachan-EDBT2023.
  • Figure 2: Computational framework.
  • Figure 3: Block ladder structure of the PMO formulation.
  • Figure 4: The Benders decomposition framework.
  • Figure 5: Example of MP components (each red circle highlights an MP's component).
  • ...and 9 more figures

Theorems & Definitions (7)

  • Definition 2.1
  • Definition 2.2
  • Definition 3.1
  • Definition 3.2
  • Proposition 3.3
  • proof
  • proof