Allowing humans to interactively guide machines where to look does not always improve human-AI team's classification accuracy
Giang Nguyen, Mohammad Reza Taesiri, Sunnie S. Y. Kim, Anh Nguyen
TL;DR
This paper interrogates whether letting humans interactively guide where a vision model looks can improve human-AI team accuracy on fine-grained bird classification. It introduces CHM-Corr++, an interactive extension of the CHM-Corr classifier, enabling patch-level attention edits on a $7\times7$ grid and producing dynamic explanations via a Gradio-based interface. In a study with 18 ML experts across 1,400 decisions on CUB-200 imagery, interactive editing did not significantly improve accuracy over static explanations, though performance varied with whether the model’s initial prediction was correct and whether the interaction changed the outcome. The work highlights conditions under which interactivity can help or hinder verification, discusses limitations of patch-attention approaches, and provides open-source tooling and data to spur future research on dynamic explanations in computer vision.
Abstract
Via thousands of papers in Explainable AI (XAI), attention maps \cite{vaswani2017attention} and feature importance maps \cite{bansal2020sam} have been established as a common means for finding how important each input feature is to an AI's decisions. It is an interesting, unexplored question whether allowing users to edit the feature importance at test time would improve a human-AI team's accuracy on downstream tasks. In this paper, we address this question by leveraging CHM-Corr, a state-of-the-art, ante-hoc explainable classifier \cite{taesiri2022visual} that first predicts patch-wise correspondences between the input and training-set images, and then bases on them to make classification decisions. We build CHM-Corr++, an interactive interface for CHM-Corr, enabling users to edit the feature importance map provided by CHM-Corr and observe updated model decisions. Via CHM-Corr++, users can gain insights into if, when, and how the model changes its outputs, improving their understanding beyond static explanations. However, our study with 18 expert users who performed 1,400 decisions finds no statistical significance that our interactive approach improves user accuracy on CUB-200 bird image classification over static explanations. This challenges the hypothesis that interactivity can boost human-AI team accuracy and raises needs for future research. We open-source CHM-Corr++, an interactive tool for editing image classifier attention (see an interactive demo here: http://137.184.82.109:7080/). We release code and data on github: https://github.com/anguyen8/chm-corr-interactive.
