Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability
Fatima Ezzeddine, Mirna Saad, Omran Ayoub, Davide Andreoletti, Martin Gjoreski, Ihab Sbeity, Marc Langheinrich, Silvia Giordano
TL;DR
This work examines how differential privacy (DP) influences anomaly detection (AD) performance and the explainability of AD models. By applying DP at the training-data level and evaluating two unsupervised AD algorithms, Isolation Forest (iForest) and Local Outlier Factor (LOF), across three datasets, the authors assess changes in accuracy and SHAP-based explanations using ShapGAP-Euclidean, ShapGAP-Cosine, and ShapLength metrics. The results reveal a privacy-explainability trade-off: DP generally degrades AD performance and perturbs SHAP explanations, with LOF showing greater robustness to DP than iForest, and effects varying by dataset and privacy budget $\varepsilon$. These findings inform practical privacy budgeting for AD and motivate future work on mitigation strategies and alternative explainability techniques under differential privacy.
Abstract
Anomaly detection (AD), also referred to as outlier detection, is a statistical process aimed at identifying observations within a dataset that significantly deviate from the expected pattern of the majority of the data. Such a process finds wide application in various fields, such as finance and healthcare. While the primary objective of AD is to yield high detection accuracy, the requirements of explainability and privacy are also paramount. The first ensures the transparency of the AD process, while the second guarantees that no sensitive information is leaked to untrusted parties. In this work, we exploit the trade-off of applying Explainable AI (XAI) through SHapley Additive exPlanations (SHAP) and differential privacy (DP). We perform AD with different models and on various datasets, and we thoroughly evaluate the cost of privacy in terms of decreased accuracy and explainability. Our results show that the enforcement of privacy through DP has a significant impact on detection accuracy and explainability, which depends on both the dataset and the considered AD model. We further show that the visual interpretation of explanations is also influenced by the choice of the AD algorithm.
