Model-Agnostic Utility-Preserving Biometric Information Anonymization
Chun-Fu Chen, Bill Moriarty, Shaohan Hu, Sean Moran, Marco Pistoia, Vincenzo Piuri, Pierangela Samarati
TL;DR
The paper tackles the privacy-utility trade-off in biometric data by proposing a model-agnostic utility-preserving anonymization framework. It transforms high-dimensional biometric records using dynamically assembled random sets and a selective weighted-mean operation guided by task-relevant feature relevance scores, to suppress sensitive attributes while retaining attributes of interest and additional attributes. Across facial, voice, and motion data, the method achieves strong suppression of identity-related information and stable retention of analytic utility, with robust results under different representations and relevance estimators. This approach enables privacy-preserving public releases of multi-modal biometric data while preserving downstream analytics capabilities, offering a practical path to safer data sharing in research and industry.
Abstract
The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people's biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophisticated analytics. While providing better user experiences and deeper business insights, the use of biometrics has raised serious privacy concerns due to their intrinsic sensitive nature and the accompanying high risk of leaking sensitive information such as identity or medical conditions. In this paper, we propose a novel modality-agnostic data transformation framework that is capable of anonymizing biometric data by suppressing its sensitive attributes and retaining features relevant to downstream machine learning-based analyses that are of research and business values. We carried out a thorough experimental evaluation using publicly available facial, voice, and motion datasets. Results show that our proposed framework can achieve a \highlight{high suppression level for sensitive information}, while at the same time retain underlying data utility such that subsequent analyses on the anonymized biometric data could still be carried out to yield satisfactory accuracy.
