RIPOST: Two-Phase Private Decomposition for Multidimensional Data
Ala Eddine Laouir, Abdessamad Imine
TL;DR
RIPOST tackles the challenge of releasing accurate multidimensional data under differential privacy by introducing a two-phase, depth-free domain-decomposition framework. It splits the decomposition into an initial phase that isolates empty regions and a secondary phase that minimizes Aggregation Error (AE) within populated blocks, publishing a leaf-based private view whose leaves are perturbed to satisfy DP. A key contribution is a privacy-budget distribution strategy that uses a convergent series with bounded sum, eliminating the need to predefine a decomposition depth $h$ and enabling flexible, data-aware partitioning. Empirical results show RIPOST outperforms state-of-the-art methods (e.g., HDPView, PrivTree, PrivBayes, P3GM) on multiple datasets and across increasing dimensionality, demonstrating improved utility (lower relative RMSE) and scalable performance for OLAP-style queries on tensors.
Abstract
Differential privacy (DP) is considered as the gold standard for data privacy. While the problem of answering simple queries and functions under DP guarantees has been thoroughly addressed in recent years, the problem of releasing multidimensional data under DP remains challenging. In this paper, we focus on this problem, in particular on how to construct privacy-preserving views using a domain decomposition approach. The main idea is to recursively split the domain into sub-domains until a convergence condition is met. The resulting sub-domains are perturbed and then published in order to be used to answer arbitrary queries. Existing methods that have addressed this problem using domain decomposition face two main challenges: (i) efficient privacy budget management over a variable and undefined decomposition depth $h$; and (ii) defining an optimal data-dependent splitting strategy that minimizes the error in the sub-domains while ensuring the smallest possible decomposition. To address these challenges, we present RIPOST, a multidimensional data decomposition algorithm that bypasses the constraint of predefined depth $h$ and applies a data-aware splitting strategy to optimize the quality of the decomposition results.The core of RIPOST is a two-phase strategy that separates non-empty sub-domains at an early stage from empty sub-domains by exploiting the properties of multidimensional datasets, and then decomposes the resulting sub-domains with minimal inaccuracies using the mean function. Moreover, RIPOST introduces a privacy budget distribution that allows decomposition without requiring prior computation of the depth $h$. Through extensive experiments, we demonstrated that \texttt{RIPOST} outperforms state-of-the-art methods in terms of data utility and accuracy on a variety of datasets and test cases
