Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering
Pascal Tilli, Ngoc Thang Vu
TL;DR
This work addresses the need for intrinsically interpretable graph-based visual question answering by introducing discrete subgraph sampling as explanations. It integrates multiple sampling methods—Aimle, Imle, Simple, and Gumbel Sub-ST—into a gvqa system that uses CLIP-based embeddings and fixed subgraph size, and it evaluates performance on GQA using accuracy and token co-occurrence metrics (At-coo and Qt-coo), complemented by a human study. The results show that Aimle and Simple achieve strong accuracy with high explanatory co-occurrences, while Gumbel SoftSub-ST underperforms unless carefully tuned; human preferences align with At-coo/Qt-coo rankings, validating these metrics as interpretable proxies. Overall, the paper provides a principled comparison and practical guidance for selecting intrinsic subgraph sampling methods to balance interpretability and predictive performance in multimodal reasoning tasks.
Abstract
Explainable artificial intelligence (XAI) aims to make machine learning models more transparent. While many approaches focus on generating explanations post-hoc, interpretable approaches, which generate the explanations intrinsically alongside the predictions, are relatively rare. In this work, we integrate different discrete subset sampling methods into a graph-based visual question answering system to compare their effectiveness in generating interpretable explanatory subgraphs intrinsically. We evaluate the methods on the GQA dataset and show that the integrated methods effectively mitigate the performance trade-off between interpretability and answer accuracy, while also achieving strong co-occurrences between answer and question tokens. Furthermore, we conduct a human evaluation to assess the interpretability of the generated subgraphs using a comparative setting with the extended Bradley-Terry model, showing that the answer and question token co-occurrence metrics strongly correlate with human preferences. Our source code is publicly available.
