Table of Contents
Fetching ...

i-WiViG: Interpretable Window Vision GNN

Ivica Obadic, Dmitry Kangin, Dario Oliveira, Plamen P Angelov, Xiao Xiang Zhu

TL;DR

The work tackles the interpretability gap in Vision GNNs for remote sensing by proposing i-WiViG, which combines local window-based graph processing with an interpretable graph bottleneck that ranks long-range inter-patch relations. This yields a self-explanatory model that identifies a subgraph responsible for predictions, while maintaining competitive accuracy on classification (NWPU-RESISC45) and regression (Liveability) tasks. Extensive qualitative and quantitative analyses show faithful explanations with reduced infidelity compared to post-hoc methods, and the approach preserves explanation sparsity. Overall, i-WiViG advances practical, inherently interpretable graph-based vision models for remote sensing applications, enabling interpretable long-range reasoning without sacrificing performance.

Abstract

Deep learning models based on graph neural networks have emerged as a popular approach for solving computer vision problems. They encode the image into a graph structure and can be beneficial for efficiently capturing the long-range dependencies typically present in remote sensing imagery. However, an important drawback of these methods is their black-box nature which may hamper their wider usage in critical applications. In this work, we tackle the self-interpretability of the graph-based vision models by proposing our Interpretable Window Vision GNN (i-WiViG) approach, which provides explanations by automatically identifying the relevant subgraphs for the model prediction. This is achieved with window-based image graph processing that constrains the node receptive field to a local image region and by using a self-interpretable graph bottleneck that ranks the importance of the long-range relations between the image regions. We evaluate our approach to remote sensing classification and regression tasks, showing it achieves competitive performance while providing inherent and faithful explanations through the identified relations. Further, the quantitative evaluation reveals that our model reduces the infidelity of post-hoc explanations compared to other Vision GNN models, without sacrificing explanation sparsity.

i-WiViG: Interpretable Window Vision GNN

TL;DR

The work tackles the interpretability gap in Vision GNNs for remote sensing by proposing i-WiViG, which combines local window-based graph processing with an interpretable graph bottleneck that ranks long-range inter-patch relations. This yields a self-explanatory model that identifies a subgraph responsible for predictions, while maintaining competitive accuracy on classification (NWPU-RESISC45) and regression (Liveability) tasks. Extensive qualitative and quantitative analyses show faithful explanations with reduced infidelity compared to post-hoc methods, and the approach preserves explanation sparsity. Overall, i-WiViG advances practical, inherently interpretable graph-based vision models for remote sensing applications, enabling interpretable long-range reasoning without sacrificing performance.

Abstract

Deep learning models based on graph neural networks have emerged as a popular approach for solving computer vision problems. They encode the image into a graph structure and can be beneficial for efficiently capturing the long-range dependencies typically present in remote sensing imagery. However, an important drawback of these methods is their black-box nature which may hamper their wider usage in critical applications. In this work, we tackle the self-interpretability of the graph-based vision models by proposing our Interpretable Window Vision GNN (i-WiViG) approach, which provides explanations by automatically identifying the relevant subgraphs for the model prediction. This is achieved with window-based image graph processing that constrains the node receptive field to a local image region and by using a self-interpretable graph bottleneck that ranks the importance of the long-range relations between the image regions. We evaluate our approach to remote sensing classification and regression tasks, showing it achieves competitive performance while providing inherent and faithful explanations through the identified relations. Further, the quantitative evaluation reveals that our model reduces the infidelity of post-hoc explanations compared to other Vision GNN models, without sacrificing explanation sparsity.

Paper Structure

This paper contains 26 sections, 2 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: VisionGNN approaches visualized in the top row yield large and overlapping receptive fields for the graph nodes, which limits their interpretability only to the standard vision post-hoc attribution methods such as saliency maps. In contrast, our i-WiViG approach depicted in the bottom row offers an inherently interpretable model that relies on local window graph processing to constrain the node receptive field and further introduces an interpretable graph bottleneck that learns a ranking of the long-range dependencies between the local windows that inherently explain the model workings. Such explanations cannot be obtained through the models.
  • Figure 2: i-WiViG is split into several steps: (1) per-patch representation learning, (2) per-window graph formation, (3) graph representation learning for the window graphs (4) interpretable graph representation learning over the global graph, where the nodes represent windows. As an output, our model yields the prediction along with a ranking of the edge importance for the outcome.
  • Figure 3: The relevant subgraphs within the top-5 edge importance percentile for the predictions of the i-WiViG model on examples of the class Bridge (left) and the class Airplane (middle) and Medium Residential Area (right) in the NWPU-RESISC45 dataset.
  • Figure 4: Examples of identified subgraphs containing edges within the top-5 importance percentile for the predictions of the i-WiViG model on examples of high (left), medium (centre) and low liveability areas (right) from the Liveability dataset.
  • Figure 5: RESISC45 edge attribution evaluation after an incremental addition of the edges with highest importance (blue curve) and the edges with lowest importance (orange curve).
  • ...and 4 more figures