A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

Thanh Tam Nguyen; Thanh Trung Huynh; Zhao Ren; Thanh Toan Nguyen; Phi Le Nguyen; Hongzhi Yin; Quoc Viet Hung Nguyen

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Thanh Toan Nguyen, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen

TL;DR

This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures, a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations.

Abstract

As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorisation of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings. Interested readers are encouraged to access our repository at https://github.com/tamlhp/awesome-privex.

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

TL;DR

Abstract

Paper Structure (53 sections, 32 equations, 17 figures, 4 tables)

This paper contains 53 sections, 32 equations, 17 figures, 4 tables.

Introduction
Comparisons with existing surveys
Paper collection methodology
Contributions of the article
Organisation of the article
Model Explanations
Feature-based Explanations
Interpretable Surrogates
Example-based Explanations
Counterfactual Explanations
Privacy Attacks
Membership Inference Attacks (MIA)
Linkage Attacks
Reconstruction Attacks
Attribute/Feature Inference Attacks
...and 38 more sections

Figures (17)

Figure 1: This work vs. existing surveys. Explainable AI involves explanation and interpretable methods (e.g. bodria2023benchmarkingguidotti2018surveygilpin2018explaining). Adversarial AI includes adversarial attacks on ML models (e.g. machado2021adversarialbiggio2018wild). Privacy AI involves privacy issues in ML (e.g. rigaki2023surveyhu2022membershipliu2021machine). Others ferry2023sokbaniecki2024adversarial discuss exploits on model explanations. Our survey offers the first complete picture on privacy attacks, leaks, and defenses in explainable AI.
Figure 2: Our taxonomy of privacy attacks and countermeasures on model explanations. "Exploit" arrows indicate existing works about privacy attacks on targeted explanations. "Support" arrows indicate existing works about privacy countermeasures for corresponding explanations. Some countermeasures (e.g. Privacy-Preserving ML) target privacy attacks directly and their arrows are omitted for brevity sake.
Figure 3: Feature-based explanations via backpropagation.
Figure 4: Decision boundaries between human analyst and a learnt model.
Figure 5: Membership inference attacks.
...and 12 more figures

Theorems & Definitions (2)

Example 1
Example 2

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

TL;DR

Abstract

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

Authors

TL;DR

Abstract

Table of Contents

Figures (17)

Theorems & Definitions (2)