Table of Contents
Fetching ...

Interactivity x Explainability: Toward Understanding How Interactivity Can Improve Computer Vision Explanations

Indu Panigrahi, Sunnie S. Y. Kim, Amna Liaqat, Rohan Jinturkar, Olga Russakovsky, Ruth Fong, Parastoo Abtahi

TL;DR

Static computer vision explanations often overwhelm users, obscure connections between pixels and semantic concepts, and limit exploration. The study conducts a within-subjects evaluation of three explanation types (heatmap-, concept-, prototype-based) with three interactive mechanisms (Filtering, Overlays, Counterfactuals) on a bird-identification task with N=24 to assess usability and perception. Findings show that interactivity enhances user control and accelerates access to relevant information and broader model understanding, but some mechanisms (notably Counterfactuals) can overwhelm users; Overlays aid pixel-to-semantics mapping and Filtering helps manage detail. The paper offers design recommendations—optimized default views, independent input controls, and constrained interaction spaces—to guide the development of effective, user-centered interactive CV explanations and advance XAI usability in CV contexts.

Abstract

Explanations for computer vision models are important tools for interpreting how the underlying models work. However, they are often presented in static formats, which pose challenges for users, including information overload, a gap between semantic and pixel-level information, and limited opportunities for exploration. We investigate interactivity as a mechanism for tackling these issues in three common explanation types: heatmap-based, concept-based, and prototype-based explanations. We conducted a study (N=24), using a bird identification task, involving participants with diverse technical and domain expertise. We found that while interactivity enhances user control, facilitates rapid convergence to relevant information, and allows users to expand their understanding of the model and explanation, it also introduces new challenges. To address these, we provide design recommendations for interactive computer vision explanations, including carefully selected default views, independent input controls, and constrained output spaces.

Interactivity x Explainability: Toward Understanding How Interactivity Can Improve Computer Vision Explanations

TL;DR

Static computer vision explanations often overwhelm users, obscure connections between pixels and semantic concepts, and limit exploration. The study conducts a within-subjects evaluation of three explanation types (heatmap-, concept-, prototype-based) with three interactive mechanisms (Filtering, Overlays, Counterfactuals) on a bird-identification task with N=24 to assess usability and perception. Findings show that interactivity enhances user control and accelerates access to relevant information and broader model understanding, but some mechanisms (notably Counterfactuals) can overwhelm users; Overlays aid pixel-to-semantics mapping and Filtering helps manage detail. The paper offers design recommendations—optimized default views, independent input controls, and constrained interaction spaces—to guide the development of effective, user-centered interactive CV explanations and advance XAI usability in CV contexts.

Abstract

Explanations for computer vision models are important tools for interpreting how the underlying models work. However, they are often presented in static formats, which pose challenges for users, including information overload, a gap between semantic and pixel-level information, and limited opportunities for exploration. We investigate interactivity as a mechanism for tackling these issues in three common explanation types: heatmap-based, concept-based, and prototype-based explanations. We conducted a study (N=24), using a bird identification task, involving participants with diverse technical and domain expertise. We found that while interactivity enhances user control, facilitates rapid convergence to relevant information, and allows users to expand their understanding of the model and explanation, it also introduces new challenges. To address these, we provide design recommendations for interactive computer vision explanations, including carefully selected default views, independent input controls, and constrained output spaces.

Paper Structure

This paper contains 21 sections, 3 figures.

Figures (3)

  • Figure 1: 12 explanation mock-ups for 3 explanation types (rows) and 4 presentation types (columns). All bird images were from the Caltech-UCSD Birds-200-2011 dataset cubdataset.
  • Figure 2: Average participant ratings for survey statements. Lower is better. Error bars are 95% confidence intervals.
  • Figure 3: Average participant rankings for general preference and preference for learning bird species. Ties are allowed. Lower is better. Error bars are 95% confidence intervals.