Table of Contents
Fetching ...

TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction

Amirhossein Aminimehr, Pouya Khani, Amirali Molaei, Amirmohammad Kazemeini, Erik Cambria

TL;DR

TbExplain tackles the interpretability gap in scene classification by replacing traditional heatmaps with text-based explanations grounded in detected objects. It integrates six modules, including Object Validation via an Overlapping Score and a Statistical Prediction Correction (SPC) that uses object-class weights to revise predictions when confidence is low. The approach yields reliable, object-grounded explanations and, in quantitative tests across MIT67, Places365, and SUN397, demonstrates accuracy gains over several ResNet baselines, with further improvements when using LIME or GradCAM within the explanation module. Overall, TbExplain enhances both trust and performance in scene understanding by coupling textual narratives with principled object-based correction. The framework supports practical adoption for interpretability-focused applications and may inform future XAI methods for vision tasks.

Abstract

The field of Explainable Artificial Intelligence (XAI) aims to improve the interpretability of black-box machine learning models. Building a heatmap based on the importance value of input features is a popular method for explaining the underlying functions of such models in producing their predictions. Heatmaps are almost understandable to humans, yet they are not without flaws. Non-expert users, for example, may not fully understand the logic of heatmaps (the logic in which relevant pixels to the model's prediction are highlighted with different intensities or colors). Additionally, objects and regions of the input image that are relevant to the model prediction are frequently not entirely differentiated by heatmaps. In this paper, we propose a framework called TbExplain that employs XAI techniques and a pre-trained object detector to present text-based explanations of scene classification models. Moreover, TbExplain incorporates a novel method to correct predictions and textually explain them based on the statistics of objects in the input image when the initial prediction is unreliable. To assess the trustworthiness and validity of the text-based explanations, we conducted a qualitative experiment, and the findings indicated that these explanations are sufficiently reliable. Furthermore, our quantitative and qualitative experiments on TbExplain with scene classification datasets reveal an improvement in classification accuracy over ResNet variants.

TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction

TL;DR

TbExplain tackles the interpretability gap in scene classification by replacing traditional heatmaps with text-based explanations grounded in detected objects. It integrates six modules, including Object Validation via an Overlapping Score and a Statistical Prediction Correction (SPC) that uses object-class weights to revise predictions when confidence is low. The approach yields reliable, object-grounded explanations and, in quantitative tests across MIT67, Places365, and SUN397, demonstrates accuracy gains over several ResNet baselines, with further improvements when using LIME or GradCAM within the explanation module. Overall, TbExplain enhances both trust and performance in scene understanding by coupling textual narratives with principled object-based correction. The framework supports practical adoption for interpretability-focused applications and may inform future XAI methods for vision tasks.

Abstract

The field of Explainable Artificial Intelligence (XAI) aims to improve the interpretability of black-box machine learning models. Building a heatmap based on the importance value of input features is a popular method for explaining the underlying functions of such models in producing their predictions. Heatmaps are almost understandable to humans, yet they are not without flaws. Non-expert users, for example, may not fully understand the logic of heatmaps (the logic in which relevant pixels to the model's prediction are highlighted with different intensities or colors). Additionally, objects and regions of the input image that are relevant to the model prediction are frequently not entirely differentiated by heatmaps. In this paper, we propose a framework called TbExplain that employs XAI techniques and a pre-trained object detector to present text-based explanations of scene classification models. Moreover, TbExplain incorporates a novel method to correct predictions and textually explain them based on the statistics of objects in the input image when the initial prediction is unreliable. To assess the trustworthiness and validity of the text-based explanations, we conducted a qualitative experiment, and the findings indicated that these explanations are sufficiently reliable. Furthermore, our quantitative and qualitative experiments on TbExplain with scene classification datasets reveal an improvement in classification accuracy over ResNet variants.
Paper Structure (13 sections, 4 equations, 6 figures, 1 table)

This paper contains 13 sections, 4 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: An overview of the proposed framework.
  • Figure 2: The structure of the overlap scoring function. $b$ is the object bounding box and $OR$ is the overlapping region.
  • Figure 3: Outputs of TbExplain in three scenarios.
  • Figure 4: Examples of the sentence generation's outputs (i.e., the text-based explanation).
  • Figure 5: Recognition accuracy of TbExplain on the validation set based on the specified thresholds $T_C$, $T_R$, and $T_P$.
  • ...and 1 more figures