MuCo: Publishing Microdata with Privacy Preservation through Mutual Cover
Boyu Li, Jianfeng Ma, Junhua Xi, Lili Zhang, Tao Xie, Tongfei Shang
TL;DR
MuCo tackles privacy in microdata publishing by replacing direct QI value generalization with a mutual cover mechanism that perturbs QI values according to group-specific random output tables under a $\delta$-probability constraint. This design preserves the distributions of quasi-identifiers more faithfully than traditional $k$-anonymity generalization while providing strong identity and attribute protection, and it treats the anonymization process as a hidden operation to confuse adversaries. The method partitions data into covering groups, optimizes the random output tables via a linear program, and samples from these tables to generate anonymized microdata, yielding more accurate query answers than Mondrian or Anatomy. Extensive experiments on US Census data demonstrate that MuCo achieves lower information loss for comparable privacy levels and maintains stable query accuracy, with the flexibility to trade off privacy and utility through the parameter $\delta$ and the diversity parameter $l$. The approach offers a practical alternative to differential privacy in microdata publication by explicitly preserving QI distributions and enabling precise tuple-level query results while mitigating re-identification risks.
Abstract
We study the anonymization technique of k-anonymity family for preserving privacy in the publication of microdata. Although existing approaches based on generalization can provide good enough protections, the generalized table always suffers from considerable information loss, mainly because the distributions of QI (Quasi-Identifier) values are barely preserved and the results of query statements are groups rather than specific tuples. To this end, we propose a novel technique, called the Mutual Cover (MuCo), to prevent the adversary from matching the combination of QI values in published microdata. The rationale is to replace some original QI values with random values according to random output tables, making similar tuples to cover for each other with the minimum cost. As a result, MuCo can prevent both identity disclosure and attribute disclosure while retaining the information utility more effectively than generalization. The effectiveness of MuCo is verified with extensive experiments.
