Table of Contents
Fetching ...

Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy

Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen

TL;DR

This paper interrogates whether letting humans interactively guide where a vision model looks can improve human-AI team accuracy on fine-grained bird classification. It introduces CHM-Corr++, an interactive extension of the CHM-Corr classifier, enabling patch-level attention edits on a $7\times7$ grid and producing dynamic explanations via a Gradio-based interface. In a study with 18 ML experts across 1,400 decisions on CUB-200 imagery, interactive editing did not significantly improve accuracy over static explanations, though performance varied with whether the model’s initial prediction was correct and whether the interaction changed the outcome. The work highlights conditions under which interactivity can help or hinder verification, discusses limitations of patch-attention approaches, and provides open-source tooling and data to spur future research on dynamic explanations in computer vision.

Abstract

Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy on downstream tasks. In this paper, we address this question by leveraging CHM-Corr, a state-of-the-art, ante-hoc explainable classifier \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and training-set images, and then bases on them to make classification decisions. We build CHM-Corr++, an interactive interface for CHM-Corr, enabling users to edit the feature importance map provided by CHM-Corr and observe updated model decisions. Via CHM-Corr++, users can gain insights into if, when, and how the model changes its outputs, improving their understanding beyond static explanations. However, our study with 18 expert users who performed 1,400 decisions finds no statistical significance that our interactive approach improves user accuracy on CUB-200 bird image classification over static explanations. This challenges the hypothesis that interactivity can boost human-AI team accuracy and raises needs for future research. We open-source CHM-Corr++, an interactive tool for editing image classifier attention (see an interactive demo here: http://137.184.82.109:7080/). We release code and data on github: https://github.com/anguyen8/chm-corr-interactive.

Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy

TL;DR

This paper interrogates whether letting humans interactively guide where a vision model looks can improve human-AI team accuracy on fine-grained bird classification. It introduces CHM-Corr++, an interactive extension of the CHM-Corr classifier, enabling patch-level attention edits on a grid and producing dynamic explanations via a Gradio-based interface. In a study with 18 ML experts across 1,400 decisions on CUB-200 imagery, interactive editing did not significantly improve accuracy over static explanations, though performance varied with whether the model’s initial prediction was correct and whether the interaction changed the outcome. The work highlights conditions under which interactivity can help or hinder verification, discusses limitations of patch-attention approaches, and provides open-source tooling and data to spur future research on dynamic explanations in computer vision.

Abstract

Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy on downstream tasks. In this paper, we address this question by leveraging CHM-Corr, a state-of-the-art, ante-hoc explainable classifier \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and training-set images, and then bases on them to make classification decisions. We build CHM-Corr++, an interactive interface for CHM-Corr, enabling users to edit the feature importance map provided by CHM-Corr and observe updated model decisions. Via CHM-Corr++, users can gain insights into if, when, and how the model changes its outputs, improving their understanding beyond static explanations. However, our study with 18 expert users who performed 1,400 decisions finds no statistical significance that our interactive approach improves user accuracy on CUB-200 bird image classification over static explanations. This challenges the hypothesis that interactivity can boost human-AI team accuracy and raises needs for future research. We open-source CHM-Corr++, an interactive tool for editing image classifier attention (see an interactive demo here: http://137.184.82.109:7080/). We release code and data on github: https://github.com/anguyen8/chm-corr-interactive.
Paper Structure (13 sections, 6 figures, 2 tables)

This paper contains 13 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Users provide feedback to the explainable AI by editing the feature importance map (like an "attention map" vaswani2017attention).
  • Figure 2: Our CHM-Corr++ interactive interface. We let users interact with the image classification model (here CHM-Corr taesiri2022visual) via controlling the attention (selecting patches) the model should look at (a). Based on the user-guided attention, the model compares the input image (GT class: Cardinal) with candidate, training examples to simultaneously generate visual-correspondence explanations (b) and predictions (c). The user iteratively observes the dynamic explanations (b) and predictions (c) to understand the image classification model to accept or reject (d) the original top-1 predicted label (here Summer Tanager).
  • Figure A1: Both dynamic and static explanations enable human users to verify that the AI is predicting the top-1 label correctly.
  • Figure A2: Human intervention changes the top-1 label from Rufous Hummingbird$\to$Anna Hummingbird that makes users more likely to reject the original, correct label.
  • Figure A3: AI initially makes the wrong classification Indigo Bunting on the input image. Human intervention changes the top-1 label from Indigo Bunting$\to$Lazuli Bunting, a more similar-looking class, encouraging users to reject the original, predicted label.
  • ...and 1 more figures