On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective
Junhwa Song, Keumgang Cha, Junghoon Seo
TL;DR
This paper challenges the reliability of ROAR and its variant ROAD as universal benchmarks for feature-importance explanations. By framing attribution informativeness through a data-generation causal model and applying a conditional data-processing inequality, the authors show that post-processing agnostic to the model can unintentionally improve ROAR scores without increasing information about the decision function. They demonstrate, both theoretically with DPI arguments and empirically across CIFAR-10, SVHN, and CUB-200, that blurriness-inducing post-processings and block-based masking can depress ROAR performance even when the underlying attribution provides less model information. The work cautions researchers to beware data-structure biases and calls for broader, more robust perturbation-based evaluation frameworks beyond ROAR.
Abstract
Approaches for appraising feature importance approximations, alternatively referred to as attribution methods, have been established across an extensive array of contexts. The development of resilient techniques for performance benchmarking constitutes a critical concern in the sphere of explainable deep learning. This study scrutinizes the dependability of the RemOve-And-Retrain (ROAR) procedure, which is prevalently employed for gauging the performance of feature importance estimates. The insights gleaned from our theoretical foundation and empirical investigations reveal that attributions containing lesser information about the decision function may yield superior results in ROAR benchmarks, contradicting the original intent of ROAR. This occurrence is similarly observed in the recently introduced variant RemOve-And-Debias (ROAD), and we posit a persistent pattern of blurriness bias in ROAR attribution metrics. Our findings serve as a warning against indiscriminate use on ROAR metrics.
