Table of Contents
Fetching ...

Geometric Remove-and-Retrain (GOAR): Coordinate-Invariant eXplainable AI Assessment

Yong-Hyun Park, Junghoon Seo, Bomseok Park, Seongsu Lee, Junghyo Jo

TL;DR

This work identifies fundamental geometric limitations in pixel-based attribution benchmarks (ROAR/ROAD) that rely on pixel coordinates and all-or-none removals. It proposes Geometric Remove-and-Retrain (GOAR), a coordinate-invariant perturbation method that shifts samples along feature directions and uses diffusion-based manifold projection (SDEdit adaptation) to keep perturbed data on the data manifold, followed by retraining and counting cumulative misclassifications. Across synthetic, vision, and tabular datasets, GOAR demonstrates higher alignment with ground-truth feature assessments (OpenXAI) and superior discrimination among attribution methods, albeit with high computational cost. The approach offers a more reliable, geometry-aware standard for evaluating feature attributions with practical applicability to diverse data domains, while highlighting areas for efficiency improvements and further debiasing work.

Abstract

Identifying the relevant input features that have a critical influence on the output results is indispensable for the development of explainable artificial intelligence (XAI). Remove-and-Retrain (ROAR) is a widely accepted approach for assessing the importance of individual pixels by measuring changes in accuracy following their removal and subsequent retraining of the modified dataset. However, we uncover notable limitations in pixel-perturbation strategies. When viewed from a geometric perspective, we discover that these metrics fail to discriminate between differences among feature attribution methods, thereby compromising the reliability of the evaluation. To address this challenge, we introduce an alternative feature-perturbation approach named Geometric Remove-and-Retrain (GOAR). Through a series of experiments with both synthetic and real datasets, we substantiate that GOAR transcends the limitations of pixel-centric metrics.

Geometric Remove-and-Retrain (GOAR): Coordinate-Invariant eXplainable AI Assessment

TL;DR

This work identifies fundamental geometric limitations in pixel-based attribution benchmarks (ROAR/ROAD) that rely on pixel coordinates and all-or-none removals. It proposes Geometric Remove-and-Retrain (GOAR), a coordinate-invariant perturbation method that shifts samples along feature directions and uses diffusion-based manifold projection (SDEdit adaptation) to keep perturbed data on the data manifold, followed by retraining and counting cumulative misclassifications. Across synthetic, vision, and tabular datasets, GOAR demonstrates higher alignment with ground-truth feature assessments (OpenXAI) and superior discrimination among attribution methods, albeit with high computational cost. The approach offers a more reliable, geometry-aware standard for evaluating feature attributions with practical applicability to diverse data domains, while highlighting areas for efficiency improvements and further debiasing work.

Abstract

Identifying the relevant input features that have a critical influence on the output results is indispensable for the development of explainable artificial intelligence (XAI). Remove-and-Retrain (ROAR) is a widely accepted approach for assessing the importance of individual pixels by measuring changes in accuracy following their removal and subsequent retraining of the modified dataset. However, we uncover notable limitations in pixel-perturbation strategies. When viewed from a geometric perspective, we discover that these metrics fail to discriminate between differences among feature attribution methods, thereby compromising the reliability of the evaluation. To address this challenge, we introduce an alternative feature-perturbation approach named Geometric Remove-and-Retrain (GOAR). Through a series of experiments with both synthetic and real datasets, we substantiate that GOAR transcends the limitations of pixel-centric metrics.
Paper Structure (44 sections, 10 figures, 1 table)

This paper contains 44 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Comparison between previous perturbation strategies (ROAR, ROAD) and ours (GOAR). The perturbation-based metric evaluates the significance of features by comparing the model's performance between the original dataset (a) and a modified dataset where significant features have been removed (b, c, d). The top row illustrates the feature removal process with a 2D dataset, while the bottom row visualizes the samples after removing features based on each benchmark's perturbation method. The shaded regions in the 2D dataset represent the decision boundaries of models trained on each dataset. (a) The black arrow indicates the feature discovered using the attribution method for the sample marked with a circle. (b, c) Pixel-perturbation methods like Remove-and-Retrain (ROAR) and Remove-and-Debias (ROAD) fail to eliminate important information from samples, resulting in little degradation in performance. This is because these methods move samples in a pixel-basis direction (red arrow) unrelated to the feature direction (black arrow) when erasing information. (d) In contrast, our GOAR offers a precise and effective way to remove features, enabling an accurate comparison of attribution methods.
  • Figure 2: Pitfalls of ROAR. Red and Green circles are two data points corresponding to different classes. Black arrows ($\rightarrow$) represent the feature vectors obtained from the attribution method and dashed arrows ($\dashedrightarrow$) indicate the pixel-perturbation based on ROAR. Red borders indicate the points where the performance drop occurs. (a) ROAR produces different results in the same geometrical situation depending on coordinate choices. (b) ROAR can produce the same results with different features, which makes it challenging to discriminate between these various features.
  • Figure 3: Projection onto the data manifold. Perturbation of input data (a) without manifold projection and (b) with manifold projection.
  • Figure 4: Monitoring Performance Degradation. (top) GOAR perturbs each sample in the opposite feature direction. Continued perturbation on samples eventually makes mixed samples distinguishable again. (bottom) Accuracy does not accurately capture the number of samples that lost information, while Cumulative accuracy correctly captures the number of lost samples.
  • Figure 5: Feature assessment with a toy dataset. Comparison of features using ROAR, ROAD, and GOAR. The arrows in inset depict the directions of features with varying magnitudes $\lambda$ of perturbation. GOAR is the only method where the performance drop becomes significant when the features are similar to the original feature (blue arrow with $\lambda=1$). Note that symbols of different colors overlap in ROAR and ROAD plots.
  • ...and 5 more figures