DeSIA: Attribute Inference Attacks Against Limited Fixed Aggregate Statistics
Yifeng Mao, Bozhidar Stevanoski, Yves-Alexandre de Montjoye
TL;DR
This work addresses the privacy risks of releasing fixed aggregate statistics by introducing DeSIA, a two-module attribute inference attack combining deterministic constraint programming with a stochastic ML-based predictor. DeSIA substantially outperforms state-of-the-art reconstruction attacks (CIP, RAP) on the Census PPMF and ACS datasets, achieving a high AUC and notably strong TPR at very low FPRs (e.g., TPR@FPR $=10^{-3}$ around $0.14$). The authors provide a formal privacy game framework for AIAs against fixed aggregates, demonstrate robust performance under varying numbers of released aggregates and noise, and conduct extensive ablation to justify each component. They further extend DeSIA to membership inference attacks, where it again surpasses baselines, underscoring that aggregation alone does not suffice to protect privacy and highlighting the need for formal privacy mechanisms and testing. Overall, the paper presents a practical, hybrid attack that reveals substantial information leakage from limited aggregate releases and advances the discourse on protecting tabular data under fixed aggregation.
Abstract
Empirical inference attacks are a popular approach for evaluating the privacy risk of data release mechanisms in practice. While an active attack literature exists to evaluate machine learning models or synthetic data release, we currently lack comparable methods for fixed aggregate statistics, in particular when only a limited number of statistics are released. We here propose an inference attack framework against fixed aggregate statistics and an attribute inference attack called DeSIA. We instantiate DeSIA against the U.S. Census PPMF dataset and show it to strongly outperform reconstruction-based attacks. In particular, we show DeSIA to be highly effective at identifying vulnerable users, achieving a true positive rate of 0.14 at a false positive rate of $10^{-3}$. We then show DeSIA to perform well against users whose attributes cannot be verified and when varying the number of aggregate statistics and level of noise addition. We also perform an extensive ablation study of DeSIA and show how DeSIA can be successfully adapted to the membership inference task. Overall, our results show that aggregation alone is not sufficient to protect privacy, even when a relatively small number of aggregates are being released, and emphasize the need for formal privacy mechanisms and testing before aggregate statistics are released.
