Geometric Remove-and-Retrain (GOAR): Coordinate-Invariant eXplainable AI Assessment
Yong-Hyun Park, Junghoon Seo, Bomseok Park, Seongsu Lee, Junghyo Jo
TL;DR
This work identifies fundamental geometric limitations in pixel-based attribution benchmarks (ROAR/ROAD) that rely on pixel coordinates and all-or-none removals. It proposes Geometric Remove-and-Retrain (GOAR), a coordinate-invariant perturbation method that shifts samples along feature directions and uses diffusion-based manifold projection (SDEdit adaptation) to keep perturbed data on the data manifold, followed by retraining and counting cumulative misclassifications. Across synthetic, vision, and tabular datasets, GOAR demonstrates higher alignment with ground-truth feature assessments (OpenXAI) and superior discrimination among attribution methods, albeit with high computational cost. The approach offers a more reliable, geometry-aware standard for evaluating feature attributions with practical applicability to diverse data domains, while highlighting areas for efficiency improvements and further debiasing work.
Abstract
Identifying the relevant input features that have a critical influence on the output results is indispensable for the development of explainable artificial intelligence (XAI). Remove-and-Retrain (ROAR) is a widely accepted approach for assessing the importance of individual pixels by measuring changes in accuracy following their removal and subsequent retraining of the modified dataset. However, we uncover notable limitations in pixel-perturbation strategies. When viewed from a geometric perspective, we discover that these metrics fail to discriminate between differences among feature attribution methods, thereby compromising the reliability of the evaluation. To address this challenge, we introduce an alternative feature-perturbation approach named Geometric Remove-and-Retrain (GOAR). Through a series of experiments with both synthetic and real datasets, we substantiate that GOAR transcends the limitations of pixel-centric metrics.
