i-WiViG: Interpretable Window Vision GNN
Ivica Obadic, Dmitry Kangin, Dario Oliveira, Plamen P Angelov, Xiao Xiang Zhu
TL;DR
The work tackles the interpretability gap in Vision GNNs for remote sensing by proposing i-WiViG, which combines local window-based graph processing with an interpretable graph bottleneck that ranks long-range inter-patch relations. This yields a self-explanatory model that identifies a subgraph responsible for predictions, while maintaining competitive accuracy on classification (NWPU-RESISC45) and regression (Liveability) tasks. Extensive qualitative and quantitative analyses show faithful explanations with reduced infidelity compared to post-hoc methods, and the approach preserves explanation sparsity. Overall, i-WiViG advances practical, inherently interpretable graph-based vision models for remote sensing applications, enabling interpretable long-range reasoning without sacrificing performance.
Abstract
Deep learning models based on graph neural networks have emerged as a popular approach for solving computer vision problems. They encode the image into a graph structure and can be beneficial for efficiently capturing the long-range dependencies typically present in remote sensing imagery. However, an important drawback of these methods is their black-box nature which may hamper their wider usage in critical applications. In this work, we tackle the self-interpretability of the graph-based vision models by proposing our Interpretable Window Vision GNN (i-WiViG) approach, which provides explanations by automatically identifying the relevant subgraphs for the model prediction. This is achieved with window-based image graph processing that constrains the node receptive field to a local image region and by using a self-interpretable graph bottleneck that ranks the importance of the long-range relations between the image regions. We evaluate our approach to remote sensing classification and regression tasks, showing it achieves competitive performance while providing inherent and faithful explanations through the identified relations. Further, the quantitative evaluation reveals that our model reduces the infidelity of post-hoc explanations compared to other Vision GNN models, without sacrificing explanation sparsity.
