Practical and Ready-to-Use Methodology to Assess the re-identification Risk in Anonymized Datasets
Louis-Philippe Sondeck, Maryline Laurent
TL;DR
This paper tackles the lack of a precise, actionable method for re-identification risk assessment in anonymized datasets. It introduces a practical framework that combines Severity and Likelihood, drawing on the cybersecurity risk method EBIOS, and adds Exposure of attributes and Inference vulnerability to quantify risk with $R = S × L$. Severity uses the CNIL-PIA taxonomy to rate bodily, material, and moral impacts, while Likelihood decomposes into Exposure and Inference components. The approach is illustrated on real-style datasets (k-anonymity and HIPAA anonymization) to produce actionable risk profiles that guide targeted anonymization decisions to balance privacy and data utility.
Abstract
To prove that a dataset is sufficiently anonymized, many privacy policies suggest that a re-identification risk assessment be performed, but do not provide a precise methodology for doing so, leaving the industry alone with the problem. This paper proposes a practical and ready-to-use methodology for re-identification risk assessment, the originality of which is manifold: (1) it is the first to follow well-known risk analysis methods (e.g. EBIOS) that have been used in the cybersecurity field for years, which consider not only the ability to perform an attack, but also the impact such an attack can have on an individual; (2) it is the first to qualify attributes and values of attributes with e.g. degree of exposure, as known real-world attacks mainly target certain types of attributes and not others.
