Table of Contents
Fetching ...

Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability

Fatima Ezzeddine, Mirna Saad, Omran Ayoub, Davide Andreoletti, Martin Gjoreski, Ihab Sbeity, Marc Langheinrich, Silvia Giordano

TL;DR

This work examines how differential privacy (DP) influences anomaly detection (AD) performance and the explainability of AD models. By applying DP at the training-data level and evaluating two unsupervised AD algorithms, Isolation Forest (iForest) and Local Outlier Factor (LOF), across three datasets, the authors assess changes in accuracy and SHAP-based explanations using ShapGAP-Euclidean, ShapGAP-Cosine, and ShapLength metrics. The results reveal a privacy-explainability trade-off: DP generally degrades AD performance and perturbs SHAP explanations, with LOF showing greater robustness to DP than iForest, and effects varying by dataset and privacy budget $\varepsilon$. These findings inform practical privacy budgeting for AD and motivate future work on mitigation strategies and alternative explainability techniques under differential privacy.

Abstract

Anomaly detection (AD), also referred to as outlier detection, is a statistical process aimed at identifying observations within a dataset that significantly deviate from the expected pattern of the majority of the data. Such a process finds wide application in various fields, such as finance and healthcare. While the primary objective of AD is to yield high detection accuracy, the requirements of explainability and privacy are also paramount. The first ensures the transparency of the AD process, while the second guarantees that no sensitive information is leaked to untrusted parties. In this work, we exploit the trade-off of applying Explainable AI (XAI) through SHapley Additive exPlanations (SHAP) and differential privacy (DP). We perform AD with different models and on various datasets, and we thoroughly evaluate the cost of privacy in terms of decreased accuracy and explainability. Our results show that the enforcement of privacy through DP has a significant impact on detection accuracy and explainability, which depends on both the dataset and the considered AD model. We further show that the visual interpretation of explanations is also influenced by the choice of the AD algorithm.

Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability

TL;DR

This work examines how differential privacy (DP) influences anomaly detection (AD) performance and the explainability of AD models. By applying DP at the training-data level and evaluating two unsupervised AD algorithms, Isolation Forest (iForest) and Local Outlier Factor (LOF), across three datasets, the authors assess changes in accuracy and SHAP-based explanations using ShapGAP-Euclidean, ShapGAP-Cosine, and ShapLength metrics. The results reveal a privacy-explainability trade-off: DP generally degrades AD performance and perturbs SHAP explanations, with LOF showing greater robustness to DP than iForest, and effects varying by dataset and privacy budget . These findings inform practical privacy budgeting for AD and motivate future work on mitigation strategies and alternative explainability techniques under differential privacy.

Abstract

Anomaly detection (AD), also referred to as outlier detection, is a statistical process aimed at identifying observations within a dataset that significantly deviate from the expected pattern of the majority of the data. Such a process finds wide application in various fields, such as finance and healthcare. While the primary objective of AD is to yield high detection accuracy, the requirements of explainability and privacy are also paramount. The first ensures the transparency of the AD process, while the second guarantees that no sensitive information is leaked to untrusted parties. In this work, we exploit the trade-off of applying Explainable AI (XAI) through SHapley Additive exPlanations (SHAP) and differential privacy (DP). We perform AD with different models and on various datasets, and we thoroughly evaluate the cost of privacy in terms of decreased accuracy and explainability. Our results show that the enforcement of privacy through DP has a significant impact on detection accuracy and explainability, which depends on both the dataset and the considered AD model. We further show that the visual interpretation of explanations is also influenced by the choice of the AD algorithm.
Paper Structure (25 sections, 2 equations, 9 figures, 2 tables)

This paper contains 25 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Overall scheme of the experimental setup
  • Figure 2: Fidelity Accuracy of iForest and average ShapGap-Euclidean distance, ShapGap-Cosine distance and ShapLength computed across the explanations extracted using SHAP for the various iForest models and the various values of epsilon, across (a) Mammography, (b) Thyroid and (c) Bank datasets. The vertical dashed line represents the without DP metric presented at the x-axis.
  • Figure 3: Fidelity Accuracy of LOF and average ShapGap-Euclidean distance, ShapGap-Cosine distance, and ShapLength computed across the explanations extracted using SHAP for the various iForest models and the various values of $\varepsilon$, across (a) Mammography, (b) Thyroid and (c) Bank datasets. The vertical dashed line represents the without DP metric presented at the x-axis.
  • Figure 4: Distribution of iForest ShapGap-Cosine distances across a) Mammography, b) Thyroid, and c) Bank Datasets for the various $\varepsilon$ values
  • Figure 5: Distribution of iForest ShapGap-Euclidean distances across a) Mammography, b) Thyroid, and c) Bank Datasets for the various $\varepsilon$ values
  • ...and 4 more figures

Theorems & Definitions (1)

  • definition thmcounterdefinition: Differential Privacy