Table of Contents
Fetching ...

Reconciling Privacy and Explainability in High-Stakes: A Systematic Inquiry

Supriya Manna, Niladri Sett

TL;DR

This paper tackles the challenge of reconciling the Right-to-Privacy with the Right-to-Explanation in high-stakes AI by systematically studying how differential privacy (DP) interacts with post-hoc explainers. Through a chest X-ray use case, it shows that popular gradient-based explanations fail to remain faithful under DP due to altered representations and sensitivities, even when model accuracy remains similar to the non-private baseline. To address this, the authors introduce the Localization Assumption (LA) and the Privacy Invariance Score (PIS), and they show that DP training degrades explainability in a fundamental way, not just a simple privacy-utility trade-off. As a practical alternative, they explore Local Differential Privacy (LDP) applied to explanations and outline a novel industrial software pipeline that preserves privacy while providing acceptable explanations to clinicians, though with notable compromises for end-user privacy guarantees. The work collectively advances a principled framework for evaluating private explanations and points to feasible directions for integrating RTP and RTE in real-world high-stakes deployments.

Abstract

Deep learning's preponderance across scientific domains has reshaped high-stakes decision-making, making it essential to follow rigorous operational frameworks that include both Right-to-Privacy (RTP) and Right-to-Explanation (RTE). This paper examines the complexities of combining these two requirements. For RTP, we focus on `Differential privacy` (DP), which is considered the current gold standard for privacy-preserving machine learning due to its strong quantitative guarantee of privacy. For RTE, we focus on post-hoc explainers: they are the go-to option for model auditing as they operate independently of model training. We formally investigate DP models and various commonly-used post-hoc explainers: how to evaluate these explainers subject to RTP, and analyze the intrinsic interactions between DP models and these explainers. Furthermore, our work throws light on how RTP and RTE can be effectively combined in high-stakes applications. Our study concludes by outlining an industrial software pipeline, with the example of a wildly used use-case, that respects both RTP and RTE requirements.

Reconciling Privacy and Explainability in High-Stakes: A Systematic Inquiry

TL;DR

This paper tackles the challenge of reconciling the Right-to-Privacy with the Right-to-Explanation in high-stakes AI by systematically studying how differential privacy (DP) interacts with post-hoc explainers. Through a chest X-ray use case, it shows that popular gradient-based explanations fail to remain faithful under DP due to altered representations and sensitivities, even when model accuracy remains similar to the non-private baseline. To address this, the authors introduce the Localization Assumption (LA) and the Privacy Invariance Score (PIS), and they show that DP training degrades explainability in a fundamental way, not just a simple privacy-utility trade-off. As a practical alternative, they explore Local Differential Privacy (LDP) applied to explanations and outline a novel industrial software pipeline that preserves privacy while providing acceptable explanations to clinicians, though with notable compromises for end-user privacy guarantees. The work collectively advances a principled framework for evaluating private explanations and points to feasible directions for integrating RTP and RTE in real-world high-stakes deployments.

Abstract

Deep learning's preponderance across scientific domains has reshaped high-stakes decision-making, making it essential to follow rigorous operational frameworks that include both Right-to-Privacy (RTP) and Right-to-Explanation (RTE). This paper examines the complexities of combining these two requirements. For RTP, we focus on `Differential privacy` (DP), which is considered the current gold standard for privacy-preserving machine learning due to its strong quantitative guarantee of privacy. For RTE, we focus on post-hoc explainers: they are the go-to option for model auditing as they operate independently of model training. We formally investigate DP models and various commonly-used post-hoc explainers: how to evaluate these explainers subject to RTP, and analyze the intrinsic interactions between DP models and these explainers. Furthermore, our work throws light on how RTP and RTE can be effectively combined in high-stakes applications. Our study concludes by outlining an industrial software pipeline, with the example of a wildly used use-case, that respects both RTP and RTE requirements.
Paper Structure (34 sections, 10 equations, 7 figures, 4 tables)

This paper contains 34 sections, 10 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Performance of explainers (ResNet-34)
  • Figure 3: Performance of explainers (EfficientNet-V2)
  • Figure 4: dCKA heatmaps for ResNet-34
  • Figure 7: Outline of the Software
  • Figure 8: dCKA heatmap for ResNet-34 for CIFAR-10.
  • ...and 2 more figures