Table of Contents
Fetching ...

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Thanh Toan Nguyen, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen

TL;DR

This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures, a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations.

Abstract

As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings. Interested readers are encouraged to access our repository at https://github.com/tamlhp/awesome-privex.

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

TL;DR

This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures, a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations.

Abstract

As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings. Interested readers are encouraged to access our repository at https://github.com/tamlhp/awesome-privex.
Paper Structure (53 sections, 32 equations, 17 figures, 4 tables)

This paper contains 53 sections, 32 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: This work vs. existing surveys. Explainable AI involves explanation and interpretable methods (e.g. bodria2023benchmarkingguidotti2018surveygilpin2018explaining). Adversarial AI includes adversarial attacks on ML models (e.g. machado2021adversarialbiggio2018wild). Privacy AI involves privacy issues in ML (e.g. rigaki2023surveyhu2022membershipliu2021machine). Others ferry2023sokbaniecki2024adversarial discuss exploits on model explanations. Our survey offers the first complete picture on privacy attacks, leaks, and defenses in explainable AI.
  • Figure 2: Our taxonomy of privacy attacks and countermeasures on model explanations. "Exploit" arrows indicate existing works about privacy attacks on targeted explanations. "Support" arrows indicate existing works about privacy countermeasures for corresponding explanations. Some countermeasures (e.g. Privacy-Preserving ML) target privacy attacks directly and their arrows are omitted for brevity sake.
  • Figure 3: Feature-based explanations via backpropagation.
  • Figure 4: Decision boundaries between human analyst and a learnt model.
  • Figure 5: Membership inference attacks.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Example 1
  • Example 2