Investigating the Relationship Between Debiasing and Artifact Removal using Saliency Maps
Lukasz Sztukiewicz, Ignacy Stępka, Michał Wiliński, Jerzy Stefanowski
TL;DR
Addressing bias in computer-vision models, the paper investigates how debiasing interacts with artifact removal by analyzing saliency maps around protected attributes. It introduces four ROI-focused metrics—$RRF$, $ADR$, $DIF$, and $RDDT$—and evaluates post-hoc debiasing methods (ThrOpt, ZhangAL, SavaniAFT) and ClArC artifact-removal variants using saliency explanations from $LRP$ and $IG$. The results show that successful debiasing tends to redirect saliency away from protected-attribute regions and that artifact-removal techniques can also improve fairness, indicating a bidirectional relationship between debiasing and artifact removal. The work provides a practical framework for evaluating fairness interventions and demonstrates reproducibility through an open-source DetoxAI-based workflow, with potential implications for integrating artifact-removal strategies into fairness pipelines.
Abstract
The widespread adoption of machine learning systems has raised critical concerns about fairness and bias, making mitigating harmful biases essential for AI development. In this paper, we investigate the relationship between debiasing and removing artifacts in neural networks for computer vision tasks. First, we introduce a set of novel XAI-based metrics that analyze saliency maps to assess shifts in a model's decision-making process. Then, we demonstrate that successful debiasing methods systematically redirect model focus away from protected attributes. Finally, we show that techniques originally developed for artifact removal can be effectively repurposed for improving fairness. These findings provide evidence for the existence of a bidirectional connection between ensuring fairness and removing artifacts corresponding to protected attributes.
