ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets
Ziyu Wang, Anil Kanduri, Seyed Amir Hossein Aqajari, Salar Jafarlou, Sanaz R. Mousavi, Pasi Liljeberg, Shaista Malik, Amir M. Rahmani
TL;DR
ECG data carry biometric information that can lead to re-identification in real-world datasets. The authors employ SHAP explanations with transparent classifiers to assess re-identification risk on five diverse ECG datasets (223 participants) using PQRST-based features. They report gender, age-group, and participant-ID re-identification accuracies of approximately 0.755, 0.671, and 0.819 respectively, and identify key feature contributions via SHAP such as R-S interval, S-R amplitude, T-R amplitude, and P-Q interval. These results provide actionable insights for designing privacy-preserving anonymization techniques and informing clinicians about which ECG features most threaten patient privacy.
Abstract
While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.
