Table of Contents
Fetching ...

Model-Agnostic Utility-Preserving Biometric Information Anonymization

Chun-Fu Chen, Bill Moriarty, Shaohan Hu, Sean Moran, Marco Pistoia, Vincenzo Piuri, Pierangela Samarati

TL;DR

The paper tackles the privacy-utility trade-off in biometric data by proposing a model-agnostic utility-preserving anonymization framework. It transforms high-dimensional biometric records using dynamically assembled random sets and a selective weighted-mean operation guided by task-relevant feature relevance scores, to suppress sensitive attributes while retaining attributes of interest and additional attributes. Across facial, voice, and motion data, the method achieves strong suppression of identity-related information and stable retention of analytic utility, with robust results under different representations and relevance estimators. This approach enables privacy-preserving public releases of multi-modal biometric data while preserving downstream analytics capabilities, offering a practical path to safer data sharing in research and industry.

Abstract

The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people's biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophisticated analytics. While providing better user experiences and deeper business insights, the use of biometrics has raised serious privacy concerns due to their intrinsic sensitive nature and the accompanying high risk of leaking sensitive information such as identity or medical conditions. In this paper, we propose a novel modality-agnostic data transformation framework that is capable of anonymizing biometric data by suppressing its sensitive attributes and retaining features relevant to downstream machine learning-based analyses that are of research and business values. We carried out a thorough experimental evaluation using publicly available facial, voice, and motion datasets. Results show that our proposed framework can achieve a \highlight{high suppression level for sensitive information}, while at the same time retain underlying data utility such that subsequent analyses on the anonymized biometric data could still be carried out to yield satisfactory accuracy.

Model-Agnostic Utility-Preserving Biometric Information Anonymization

TL;DR

The paper tackles the privacy-utility trade-off in biometric data by proposing a model-agnostic utility-preserving anonymization framework. It transforms high-dimensional biometric records using dynamically assembled random sets and a selective weighted-mean operation guided by task-relevant feature relevance scores, to suppress sensitive attributes while retaining attributes of interest and additional attributes. Across facial, voice, and motion data, the method achieves strong suppression of identity-related information and stable retention of analytic utility, with robust results under different representations and relevance estimators. This approach enables privacy-preserving public releases of multi-modal biometric data while preserving downstream analytics capabilities, offering a practical path to safer data sharing in research and industry.

Abstract

The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people's biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophisticated analytics. While providing better user experiences and deeper business insights, the use of biometrics has raised serious privacy concerns due to their intrinsic sensitive nature and the accompanying high risk of leaking sensitive information such as identity or medical conditions. In this paper, we propose a novel modality-agnostic data transformation framework that is capable of anonymizing biometric data by suppressing its sensitive attributes and retaining features relevant to downstream machine learning-based analyses that are of research and business values. We carried out a thorough experimental evaluation using publicly available facial, voice, and motion datasets. Results show that our proposed framework can achieve a \highlight{high suppression level for sensitive information}, while at the same time retain underlying data utility such that subsequent analyses on the anonymized biometric data could still be carried out to yield satisfactory accuracy.
Paper Structure (27 sections, 3 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 3 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: Our utility-preserving biometric information anonymization transforms the original biometric data into an anonymized version such that the sensitive attributes can no longer be recognized from the transformed data records, but the rest of the attributes (which are useful and nonsensitive) can still be used for valuable analytics tasks.
  • Figure 2: Varying set purity $t$. Higher $t$ leads to better attribute-of-interest recognition accuracy as each original data record $\mathbf{d}$ is combined with more records sharing $\mathbf{d}$'s attribute value.
  • Figure 3: Varying weight $w$. Under $r_p=1\%$, our method works well regardless of the weight since only $1\%$ of features are retrained. On the other hand, with $r_p=50\%$, the level of mixture decreases when $w$ increases because the anonymized data record is now much closer to the original one because of the large portion of features being retained via a higher $r_p$ and anchored in place via a higher $w$.
  • Figure 4: Varying set size $g$. With larger set size, the level of mixture increases as mixing more data leads to better anonymization without affecting the recognition of the attribute of interest.
  • Figure 5: Varying feature retention ratio $r_p$. Retaining more features increases the similarity between the original and the anonymized data, and can help increase attribute recognition accuracy, but might lead to a lower mixture. Hence, a trade-off needs to be made here.
  • ...and 5 more figures