Table of Contents
Fetching ...

How to Get Actual Privacy and Utility from Privacy Models: the k-Anonymity and Differential Privacy Families

Josep Domingo-Ferrer, David Sánchez

TL;DR

This paper interrogates whether the two dominant privacy-model families, $k$-anonymity and $\epsilon$-differential privacy (DP), deliver robust ex ante privacy guarantees with usable utility. It shows that syntactic $k$-anonymity can be vulnerable to downcoding and lacks post-processing immunity, while DP guarantees collapse or become impractically noisy for large budgets and non-interactive settings, necessitating ex post risk checks. The authors advocate a semantic reformulation of $k$-anonymity, notably probabilistic $k$-anonymity, to bound reidentification risk without over-regularizing quasi-identifiers, and discuss DP relaxations (e.g., $(\epsilon,\delta)$-DP, RDP, zCDP) and deterministic variants (metric DP, individual DP) that can yield better utility under weakened guarantees. They conclude that a careful combination of semantic $k$-anonymity and DP relaxations can provide robust privacy-utility trade-offs, though selecting appropriate parameters remains challenging and ex post verification may still be necessary for practical assurance.

Abstract

Privacy models were introduced in privacy-preserving data publishing and statistical disclosure control with the promise to end the need for costly empirical assessment of disclosure risk. We examine how well this promise is kept by the main privacy models. We find they may fail to provide adequate protection guarantees because of problems in their definition or incur unacceptable trade-offs between privacy protection and utility preservation. Specifically, k-anonymity may not entirely exclude disclosure if enforced with deterministic mechanisms or without constraints on the confidential values. On the other hand, differential privacy (DP) incurs unacceptable utility loss for small budgets and its privacy guarantee becomes meaningless for large budgets. In the latter case, an ex post empirical assessment of disclosure risk becomes necessary, undermining the main appeal of privacy models. Whereas the utility preservation of DP can only be improved by relaxing its privacy guarantees, we argue that a semantic reformulation of k-anonymity can offer more robust privacy without losing utility with respect to traditional syntactic k-anonymity.

How to Get Actual Privacy and Utility from Privacy Models: the k-Anonymity and Differential Privacy Families

TL;DR

This paper interrogates whether the two dominant privacy-model families, -anonymity and -differential privacy (DP), deliver robust ex ante privacy guarantees with usable utility. It shows that syntactic -anonymity can be vulnerable to downcoding and lacks post-processing immunity, while DP guarantees collapse or become impractically noisy for large budgets and non-interactive settings, necessitating ex post risk checks. The authors advocate a semantic reformulation of -anonymity, notably probabilistic -anonymity, to bound reidentification risk without over-regularizing quasi-identifiers, and discuss DP relaxations (e.g., -DP, RDP, zCDP) and deterministic variants (metric DP, individual DP) that can yield better utility under weakened guarantees. They conclude that a careful combination of semantic -anonymity and DP relaxations can provide robust privacy-utility trade-offs, though selecting appropriate parameters remains challenging and ex post verification may still be necessary for practical assurance.

Abstract

Privacy models were introduced in privacy-preserving data publishing and statistical disclosure control with the promise to end the need for costly empirical assessment of disclosure risk. We examine how well this promise is kept by the main privacy models. We find they may fail to provide adequate protection guarantees because of problems in their definition or incur unacceptable trade-offs between privacy protection and utility preservation. Specifically, k-anonymity may not entirely exclude disclosure if enforced with deterministic mechanisms or without constraints on the confidential values. On the other hand, differential privacy (DP) incurs unacceptable utility loss for small budgets and its privacy guarantee becomes meaningless for large budgets. In the latter case, an ex post empirical assessment of disclosure risk becomes necessary, undermining the main appeal of privacy models. Whereas the utility preservation of DP can only be improved by relaxing its privacy guarantees, we argue that a semantic reformulation of k-anonymity can offer more robust privacy without losing utility with respect to traditional syntactic k-anonymity.

Paper Structure

This paper contains 14 sections, 1 equation.