Differentially Private Release of Hierarchical Origin/Destination Data with a TopDown Approach
Fabrizio Boninsegna, Francesco Silvestri
TL;DR
The paper tackles private release of hierarchical origin-destination data under bounded differential privacy, introducing InfTDA, a TopDown mechanism that uses Chebyshev distance and an integer optimizer IntOpt to enforce non-negativity and hierarchical consistency. It provides a theoretical bound on the maximum absolute error and demonstrates reduced false positives while maintaining hierarchical accuracy, validated on real ISTAT O/D data and synthetic datasets. The approach generalizes TopDown to non-negative hierarchical trees and offers a practical, faster alternative to existing TDA variants, with broad applicability to other tabular hierarchies. Overall, it delivers high-utility DP O/D datasets that remain coherent across geographic scales, enabling reliable downstream marginal queries and decision-making.
Abstract
This paper presents a novel method for generating differentially private tabular datasets for hierarchical data, specifically focusing on origin-destination (O/D) trips. The approach builds upon the TopDown algorithm, a constraint-based mechanism developed by the U.S. Census to incorporate invariant queries into tabular data. O/D hierarchical data refers to datasets representing trips between geographical areas organized in a hierarchical structure (e.g., region $\rightarrow$ province $\rightarrow$ city). The proposed method is designed to improve the accuracy of queries covering broader geographical areas, which are derived through aggregation. This feature provides a "zoom-in" effect on the dataset, ensuring that when zoomed back out, the overall picture is preserved. Furthermore, the approach aims to reduce false positive detection. These characteristics can strengthen practitioners' and decision-makers' confidence in adopting differential privacy datasets. The main technical contribution of this paper includes a novel TopDown algorithm that employs constrained optimization with Chebyshev distance minimization, with theoretical guarantees on the maximum absolute error. Additionally, we propose a new integer optimization algorithm that significantly reduces the incidence of false positives. The effectiveness of the proposed approach is validated using real-world and synthetic O/D datasets, demonstrating its ability to generate private data with high utility and a reduced number of false positives. Our experiments focus on O/D datasets with a single trip as a unit of privacy: nevertheless, the proposed approach supports other units of privacy and also applies to any tabular data with a hierarchical structure.
