Free Argumentative Exchanges for Explaining Image Classifiers
Avinash Kori, Antonio Rago, Francesca Toni
TL;DR
This work tackles the opacity of deep image classifiers by introducing Free Argumentative Exchanges (FAXs), a dialogue-based explanation framework where two agents $\mathcal{A}^1$ and $\mathcal{A}^2$ argue for and against the top predicted class. FAXs instantiate a Bipolar Argumentation Framework (BAF) with private feature sets and dialectically monotonic policies, producing exchange BAFs that reflect the classifier's reasoning. The paper introduces two evaluation metrics—consensus and persuasion rate—and demonstrates through experiments on FFHQ/AFHQ with ResNet-18 and DenseNet-121 that FAXs yield faithful, fine-grained explanations and expose reasoning gaps in biased models. Implementation hinges on class-specific discrete features learned via codebooks, a quantized classifier $q$, and REINFORCE-based training for two class-specific agents. The approach opens avenues for richer interpretability and suggests extensions to medical imaging and hierarchical concept learning.
Abstract
Deep learning models are powerful image classifiers but their opacity hinders their trustworthiness. Explanation methods for capturing the reasoning process within these classifiers faithfully and in a clear manner are scarce, due to their sheer complexity and size. We provide a solution for this problem by defining a novel method for explaining the outputs of image classifiers with debates between two agents, each arguing for a particular class. We obtain these debates as concrete instances of Free Argumentative eXchanges (FAXs), a novel argumentation-based multi-agent framework allowing agents to internalise opinions by other agents differently than originally stated. We define two metrics (consensus and persuasion rate) to assess the usefulness of FAXs as argumentative explanations for image classifiers. We then conduct a number of empirical experiments showing that FAXs perform well along these metrics as well as being more faithful to the image classifiers than conventional, non-argumentative explanation methods. All our implementations can be found at https://github.com/koriavinash1/FAX.
