Table of Contents
Fetching ...

Verifying Machine Unlearning with Explainable AI

Àlex Pujol Vidal, Anders S. Johansen, Mohammad N. S. Jahromi, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund

TL;DR

This proof-of-concept introduces feature importance as an innovative verification step for MU, expanding beyond traditional metrics and demonstrating techniques' ability to reduce reliance on undesired patterns.

Abstract

We investigate the effectiveness of Explainable AI (XAI) in verifying Machine Unlearning (MU) within the context of harbor front monitoring, focusing on data privacy and regulatory compliance. With the increasing need to adhere to privacy legislation such as the General Data Protection Regulation (GDPR), traditional methods of retraining ML models for data deletions prove impractical due to their complexity and resource demands. MU offers a solution by enabling models to selectively forget specific learned patterns without full retraining. We explore various removal techniques, including data relabeling, and model perturbation. Then, we leverage attribution-based XAI to discuss the effects of unlearning on model performance. Our proof-of-concept introduces feature importance as an innovative verification step for MU, expanding beyond traditional metrics and demonstrating techniques' ability to reduce reliance on undesired patterns. Additionally, we propose two novel XAI-based metrics, Heatmap Coverage (HC) and Attention Shift (AS), to evaluate the effectiveness of these methods. This approach not only highlights how XAI can complement MU by providing effective verification, but also sets the stage for future research to enhance their joint integration.

Verifying Machine Unlearning with Explainable AI

TL;DR

This proof-of-concept introduces feature importance as an innovative verification step for MU, expanding beyond traditional metrics and demonstrating techniques' ability to reduce reliance on undesired patterns.

Abstract

We investigate the effectiveness of Explainable AI (XAI) in verifying Machine Unlearning (MU) within the context of harbor front monitoring, focusing on data privacy and regulatory compliance. With the increasing need to adhere to privacy legislation such as the General Data Protection Regulation (GDPR), traditional methods of retraining ML models for data deletions prove impractical due to their complexity and resource demands. MU offers a solution by enabling models to selectively forget specific learned patterns without full retraining. We explore various removal techniques, including data relabeling, and model perturbation. Then, we leverage attribution-based XAI to discuss the effects of unlearning on model performance. Our proof-of-concept introduces feature importance as an innovative verification step for MU, expanding beyond traditional metrics and demonstrating techniques' ability to reduce reliance on undesired patterns. Additionally, we propose two novel XAI-based metrics, Heatmap Coverage (HC) and Attention Shift (AS), to evaluate the effectiveness of these methods. This approach not only highlights how XAI can complement MU by providing effective verification, but also sets the stage for future research to enhance their joint integration.

Paper Structure

This paper contains 20 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: XAI and MU applied in the harbor front use-case, mitigating GDPR privacy violation. XAI heatmaps show that the trained model captures human patterns while the unlearned model ignores them.
  • Figure 2: Our framework uses explainability to verify unlearning. Upon a removal request, the model undergoes unlearning to exclude people from the counting task, with SIDU verifying the unlearning success.
  • Figure 3: Examples from Original dataset and Relabeled dataset. The color of a bounding box denotes the object class, with green, red, blue and cyan denoting human, bicycle, vehicle and motorcycle respectively.
  • Figure 4: This figure shows heatmaps generated by SIDU for different model configurations. Where importance is visualized from least (blue) to most (red).
  • Figure 5: This figure illustrates the difference between the original heatmap and the unlearned. Green indicates increased attention, red indicates decreased attention and white areas minimal change.