Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition
Chenxi Qiu
TL;DR
LP-based Metric Differential Privacy (mDP) scales poorly due to the large number of variables and constraints. The paper introduces a computation framework that partitions the secret dataset via an mDP graph and applies a two-stage Benders decomposition (master program and subproblems) to solve the resulting PMO efficiently. The key contributions are a Distance-Vector based secret dataset partitioning method that preserves strong intra-subset mDP constraints while balancing subset sizes, a block-ladder PMO reformulation amenable to BD, and comprehensive experiments across geo-location, text, and synthetic data demonstrating up to $9\times$ scalability gains. This approach enables near-optimal mDP perturbation for large-scale metric spaces, with practical impact on privacy-preserving data publishing in real-world applications.
Abstract
Metric Differential Privacy (mDP) extends the concept of Differential Privacy (DP) to serve as a new paradigm of data perturbation. It is designed to protect secret data represented in general metric space, such as text data encoded as word embeddings or geo-location data on the road network or grid maps. To derive an optimal data perturbation mechanism under mDP, a widely used method is linear programming (LP), which, however, might suffer from a polynomial explosion of decision variables, rendering it impractical in large-scale mDP. In this paper, our objective is to develop a new computation framework to enhance the scalability of the LP-based mDP. Considering the connections established by the mDP constraints among the secret records, we partition the original secret dataset into various subsets. Building upon the partition, we reformulate the LP problem for mDP and solve it via Benders Decomposition, which is composed of two stages: (1) a master program to manage the perturbation calculation across subsets and (2) a set of subproblems, each managing the perturbation derivation within a subset. Our experimental results on multiple datasets, including geo-location data in the road network/grid maps, text data, and synthetic data, underscore our proposed mechanism's superior scalability and efficiency.
