Practical and Ready-to-Use Methodology to Assess the re-identification Risk in Anonymized Datasets

Louis-Philippe Sondeck; Maryline Laurent

Practical and Ready-to-Use Methodology to Assess the re-identification Risk in Anonymized Datasets

Louis-Philippe Sondeck, Maryline Laurent

TL;DR

This paper tackles the lack of a precise, actionable method for re-identification risk assessment in anonymized datasets. It introduces a practical framework that combines Severity and Likelihood, drawing on the cybersecurity risk method EBIOS, and adds Exposure of attributes and Inference vulnerability to quantify risk with $R = S × L$. Severity uses the CNIL-PIA taxonomy to rate bodily, material, and moral impacts, while Likelihood decomposes into Exposure and Inference components. The approach is illustrated on real-style datasets (k-anonymity and HIPAA anonymization) to produce actionable risk profiles that guide targeted anonymization decisions to balance privacy and data utility.

Abstract

To prove that a dataset is sufficiently anonymized, many privacy policies suggest that a re-identification risk assessment be performed, but do not provide a precise methodology for doing so, leaving the industry alone with the problem. This paper proposes a practical and ready-to-use methodology for re-identification risk assessment, the originality of which is manifold: (1) it is the first to follow well-known risk analysis methods (e.g. EBIOS) that have been used in the cybersecurity field for years, which consider not only the ability to perform an attack, but also the impact such an attack can have on an individual; (2) it is the first to qualify attributes and values of attributes with e.g. degree of exposure, as known real-world attacks mainly target certain types of attributes and not others.

Practical and Ready-to-Use Methodology to Assess the re-identification Risk in Anonymized Datasets

TL;DR

. Severity uses the CNIL-PIA taxonomy to rate bodily, material, and moral impacts, while Likelihood decomposes into Exposure and Inference components. The approach is illustrated on real-style datasets (k-anonymity and HIPAA anonymization) to produce actionable risk profiles that guide targeted anonymization decisions to balance privacy and data utility.

Abstract

Paper Structure (14 sections, 2 figures, 13 tables)

This paper contains 14 sections, 2 figures, 13 tables.

Introduction
Background and Positioning against EU and US Legislations
Re-identification Risk Calculated based on two Criteria: Severity and Likelihood
Positioning the Contribution against Existing EU and US Legislations
US Approach Limitations
EU Approach Limitations
Common Limitations of EU and US Approaches
Computing the Severity (S)
Computing the Likelihood (L)
Exposure of Attributes (Linkability)
Assessment of the Inference Vulnerability
Computing the Likelihood
Computing the re-identification Risk
Conclusions

Figures (2)

Figure 1: Our Exploitability Scale based on Exposure and Inference
Figure 2: Our Re-identification Risk Scale based on Exploitability and Severity

Practical and Ready-to-Use Methodology to Assess the re-identification Risk in Anonymized Datasets

TL;DR

Abstract

Practical and Ready-to-Use Methodology to Assess the re-identification Risk in Anonymized Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (2)