Table of Contents
Fetching ...

ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

Ziyu Wang, Anil Kanduri, Seyed Amir Hossein Aqajari, Salar Jafarlou, Sanaz R. Mousavi, Pasi Liljeberg, Shaista Malik, Amir M. Rahmani

TL;DR

ECG data carry biometric information that can lead to re-identification in real-world datasets. The authors employ SHAP explanations with transparent classifiers to assess re-identification risk on five diverse ECG datasets (223 participants) using PQRST-based features. They report gender, age-group, and participant-ID re-identification accuracies of approximately 0.755, 0.671, and 0.819 respectively, and identify key feature contributions via SHAP such as R-S interval, S-R amplitude, T-R amplitude, and P-Q interval. These results provide actionable insights for designing privacy-preserving anonymization techniques and informing clinicians about which ECG features most threaten patient privacy.

Abstract

While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.

ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

TL;DR

ECG data carry biometric information that can lead to re-identification in real-world datasets. The authors employ SHAP explanations with transparent classifiers to assess re-identification risk on five diverse ECG datasets (223 participants) using PQRST-based features. They report gender, age-group, and participant-ID re-identification accuracies of approximately 0.755, 0.671, and 0.819 respectively, and identify key feature contributions via SHAP such as R-S interval, S-R amplitude, T-R amplitude, and P-Q interval. These results provide actionable insights for designing privacy-preserving anonymization techniques and informing clinicians about which ECG features most threaten patient privacy.

Abstract

While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.
Paper Structure (10 sections, 3 equations, 3 figures, 3 tables)

This paper contains 10 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the threat model illustrating ECG data aggregation from various sources to e-health platforms, creating an attack surface for potential client re-identification.
  • Figure 2: ECG signal with detected PQRST peaks and key features, highlighting its biometric properties.
  • Figure 3: SHAP Analysis for Re-identification Tasks.