An Algorithm for Enhancing Privacy-Utility Tradeoff in the Privacy Funnel and Other Lift-based Measures
Mohammad Amin Zarrabian, Parastoo Sadeghi
TL;DR
The paper tackles the privacy-utility tradeoff in the privacy funnel by replacing the standard MI-based privacy constraint with a semi-pointwise measure $L(y)$, enabling more tractable optimization and improved utility at a fixed privacy budget. It builds on max-lift ideas by leveraging the average information density per observation and introduces a heuristic algorithm that combines high-ε lift-polytope vertices to identify extreme privacy points, yielding higher utility $I(X;Y)$ under the same privacy constraint. The approach demonstrates superior performance to prior lift-based and subset-merging methods across lift-based measures including $\ell_{1}$-norm and $\chi^{2}$-divergence, and it aligns with theoretical benchmarks in the strong $\chi^{2}$ framework. These results suggest practical gains for privacy-preserving data sharing using lift-based privacy measures in diverse settings.
Abstract
This paper investigates the privacy funnel, a privacy-utility tradeoff problem in which mutual information quantifies both privacy and utility. The objective is to maximize utility while adhering to a specified privacy budget. However, the privacy funnel represents a non-convex optimization problem, making it challenging to achieve an optimal solution. An existing proposed approach to this problem involves substituting the mutual information with the lift (the exponent of information density) and then solving the optimization. Since mutual information is the expectation of the information density, this substitution overestimates the privacy loss and results in a final smaller bound on the privacy of mutual information than what is allowed in the budget. This significantly compromises the utility. To overcome this limitation, we propose using a privacy measure that is more relaxed than the lift but stricter than mutual information while still allowing the optimization to be efficiently solved. Instead of directly using information density, our proposed measure is the average of information density over the sensitive data distribution for each observed data realization. We then introduce a heuristic algorithm capable of achieving solutions that produce extreme privacy values, which enhances utility. The numerical results confirm improved utility at the same privacy budget compared to existing solutions in the literature. Additionally, we explore two other privacy measures, $\ell_{1}$-norm and strong $χ^2$-divergence, demonstrating the applicability of our algorithm to these lift-based measures. We evaluate the performance of our method by comparing its output with previous works. Finally, we validate our heuristic approach with a theoretical framework that estimates the optimal utility for strong $χ^2$-divergence, numerically showing a perfect match.
