Table of Contents
Fetching ...

Investigating the Relationship Between Debiasing and Artifact Removal using Saliency Maps

Lukasz Sztukiewicz, Ignacy Stępka, Michał Wiliński, Jerzy Stefanowski

TL;DR

Addressing bias in computer-vision models, the paper investigates how debiasing interacts with artifact removal by analyzing saliency maps around protected attributes. It introduces four ROI-focused metrics—$RRF$, $ADR$, $DIF$, and $RDDT$—and evaluates post-hoc debiasing methods (ThrOpt, ZhangAL, SavaniAFT) and ClArC artifact-removal variants using saliency explanations from $LRP$ and $IG$. The results show that successful debiasing tends to redirect saliency away from protected-attribute regions and that artifact-removal techniques can also improve fairness, indicating a bidirectional relationship between debiasing and artifact removal. The work provides a practical framework for evaluating fairness interventions and demonstrates reproducibility through an open-source DetoxAI-based workflow, with potential implications for integrating artifact-removal strategies into fairness pipelines.

Abstract

The widespread adoption of machine learning systems has raised critical concerns about fairness and bias, making mitigating harmful biases essential for AI development. In this paper, we investigate the relationship between debiasing and removing artifacts in neural networks for computer vision tasks. First, we introduce a set of novel XAI-based metrics that analyze saliency maps to assess shifts in a model's decision-making process. Then, we demonstrate that successful debiasing methods systematically redirect model focus away from protected attributes. Finally, we show that techniques originally developed for artifact removal can be effectively repurposed for improving fairness. These findings provide evidence for the existence of a bidirectional connection between ensuring fairness and removing artifacts corresponding to protected attributes.

Investigating the Relationship Between Debiasing and Artifact Removal using Saliency Maps

TL;DR

Addressing bias in computer-vision models, the paper investigates how debiasing interacts with artifact removal by analyzing saliency maps around protected attributes. It introduces four ROI-focused metrics—, , , and —and evaluates post-hoc debiasing methods (ThrOpt, ZhangAL, SavaniAFT) and ClArC artifact-removal variants using saliency explanations from and . The results show that successful debiasing tends to redirect saliency away from protected-attribute regions and that artifact-removal techniques can also improve fairness, indicating a bidirectional relationship between debiasing and artifact removal. The work provides a practical framework for evaluating fairness interventions and demonstrates reproducibility through an open-source DetoxAI-based workflow, with potential implications for integrating artifact-removal strategies into fairness pipelines.

Abstract

The widespread adoption of machine learning systems has raised critical concerns about fairness and bias, making mitigating harmful biases essential for AI development. In this paper, we investigate the relationship between debiasing and removing artifacts in neural networks for computer vision tasks. First, we introduce a set of novel XAI-based metrics that analyze saliency maps to assess shifts in a model's decision-making process. Then, we demonstrate that successful debiasing methods systematically redirect model focus away from protected attributes. Finally, we show that techniques originally developed for artifact removal can be effectively repurposed for improving fairness. These findings provide evidence for the existence of a bidirectional connection between ensuring fairness and removing artifacts corresponding to protected attributes.

Paper Structure

This paper contains 11 sections, 5 equations, 6 figures.

Figures (6)

  • Figure 1: The left panel shows raw images, and the right panel displays corresponding LRP saliency maps. In the saliency maps, red hues indicate positive relevance towards the true class, while blue hues indicate negative contributions.
  • Figure 2: LRP saliency maps, averaged over a batch of 128 images and grouped by protected attribute (WearingNecktie) and target (Smiling) combinations. PA=1 indicates WearingNecktie, T=1 indicates Smiling.
  • Figure 3: Quantitative metrics for WearingNecktie-Smiling PA-T classification task, measured on saliency maps generated with LRP. Metrics in the upper row are supposed to be minimized, while in the lower row, maximized.
  • Figure 4: Metric values for the IG attributions and WearingNecktie protected attribute.
  • Figure 5: Metric values for the LRP attributions and WearingHat protected attribute
  • ...and 1 more figures