Reconciling Privacy and Explainability in High-Stakes: A Systematic Inquiry
Supriya Manna, Niladri Sett
TL;DR
This paper tackles the challenge of reconciling the Right-to-Privacy with the Right-to-Explanation in high-stakes AI by systematically studying how differential privacy (DP) interacts with post-hoc explainers. Through a chest X-ray use case, it shows that popular gradient-based explanations fail to remain faithful under DP due to altered representations and sensitivities, even when model accuracy remains similar to the non-private baseline. To address this, the authors introduce the Localization Assumption (LA) and the Privacy Invariance Score (PIS), and they show that DP training degrades explainability in a fundamental way, not just a simple privacy-utility trade-off. As a practical alternative, they explore Local Differential Privacy (LDP) applied to explanations and outline a novel industrial software pipeline that preserves privacy while providing acceptable explanations to clinicians, though with notable compromises for end-user privacy guarantees. The work collectively advances a principled framework for evaluating private explanations and points to feasible directions for integrating RTP and RTE in real-world high-stakes deployments.
Abstract
Deep learning's preponderance across scientific domains has reshaped high-stakes decision-making, making it essential to follow rigorous operational frameworks that include both Right-to-Privacy (RTP) and Right-to-Explanation (RTE). This paper examines the complexities of combining these two requirements. For RTP, we focus on `Differential privacy` (DP), which is considered the current gold standard for privacy-preserving machine learning due to its strong quantitative guarantee of privacy. For RTE, we focus on post-hoc explainers: they are the go-to option for model auditing as they operate independently of model training. We formally investigate DP models and various commonly-used post-hoc explainers: how to evaluate these explainers subject to RTP, and analyze the intrinsic interactions between DP models and these explainers. Furthermore, our work throws light on how RTP and RTE can be effectively combined in high-stakes applications. Our study concludes by outlining an industrial software pipeline, with the example of a wildly used use-case, that respects both RTP and RTE requirements.
