Table of Contents
Fetching ...

AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding

Xiwei Xuan, Jorge Piazentin Ono, Liang Gou, Kwan-Liu Ma, Liu Ren

TL;DR

AttributionScanner tackles the challenge of validating vision models on unstructured image data without relying on costly metadata. It introduces a metadata-free, human-in-the-loop visual analytics system that builds attribution-weighted feature vectors from GradCAM and Feature Inversion to form interpretable data slices, summarized by the Attribution Mosaic. The workflow progresses from explainable slice finding to annotation and spuriousness propagation, and finally to slice-based mitigation using CoRM. Quantitative and qualitative evaluations on CelebA and Waterbirds demonstrate reduced spurious correlations and improved robustness, with expert feedback highlighting usability and actionable insights. The approach offers a scalable, interpretable framework for vision-model validation with potential extensions to object detection, segmentation, and multimodal data.

Abstract

Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance, often characterized by distinct feature sets or descriptive metadata. However, in the context of validating vision models involving unstructured image data, this approach faces significant challenges, including the laborious and costly requirement for additional metadata and the complex task of interpreting the root causes of underperformance. To address these challenges, we introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding. Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design. Our interactive interface provides straightforward guidance for users to detect, interpret, and annotate predominant model issues, such as spurious correlations (model biases) and mislabeled data, with minimal effort. Additionally, it employs a cutting-edge model regularization technique to mitigate the detected issues and enhance the model's performance. The efficacy of AttributionScanner is demonstrated through use cases involving two benchmark datasets, with qualitative and quantitative evaluations showcasing its substantial effectiveness in vision model validation, ultimately leading to more reliable and accurate models.

AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding

TL;DR

AttributionScanner tackles the challenge of validating vision models on unstructured image data without relying on costly metadata. It introduces a metadata-free, human-in-the-loop visual analytics system that builds attribution-weighted feature vectors from GradCAM and Feature Inversion to form interpretable data slices, summarized by the Attribution Mosaic. The workflow progresses from explainable slice finding to annotation and spuriousness propagation, and finally to slice-based mitigation using CoRM. Quantitative and qualitative evaluations on CelebA and Waterbirds demonstrate reduced spurious correlations and improved robustness, with expert feedback highlighting usability and actionable insights. The approach offers a scalable, interpretable framework for vision-model validation with potential extensions to object detection, segmentation, and multimodal data.

Abstract

Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance, often characterized by distinct feature sets or descriptive metadata. However, in the context of validating vision models involving unstructured image data, this approach faces significant challenges, including the laborious and costly requirement for additional metadata and the complex task of interpreting the root causes of underperformance. To address these challenges, we introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding. Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design. Our interactive interface provides straightforward guidance for users to detect, interpret, and annotate predominant model issues, such as spurious correlations (model biases) and mislabeled data, with minimal effort. Additionally, it employs a cutting-edge model regularization technique to mitigate the detected issues and enhance the model's performance. The efficacy of AttributionScanner is demonstrated through use cases involving two benchmark datasets, with qualitative and quantitative evaluations showcasing its substantial effectiveness in vision model validation, ultimately leading to more reliable and accurate models.
Paper Structure (25 sections, 8 equations, 12 figures, 3 tables)

This paper contains 25 sections, 8 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: AttributionScanner workflow involves three phases: Explainable Data Slice Finding, Slice Summarization and Annotation, and Slice Error Mitigation. The first phase: GradCAM is used to assist the generation of feature vectors and then obtain data slices. The second phase: users can identify and annotate slice error types such as core/spurious correlations and noisy labels with the help of Attribution Mosaic and Spuriousness propagation. The third phase: the annotation and user-verified Spuriousness are used on the ML side to mitigate slice errors.
  • Figure 2: AttributionScanner applied to the model validation of a hair color classifier trained on the CelebA dataset. (A) System Menu, enabling the selection of dataset, model, and visualization options (Layout, Confusion Matrix, Scatter Plot/Attribution Mosaic, and Contour Visibility). (B) Slice Table, showing slice metrics. (C) Attribution Mosaic, showing a visual overview of all data slices, which can also be displayed as a confusion matrix view (E). (D) Slice Detail View, showing individual images or Attribution Heatmaps belonging to a selected data slice.
  • Figure 3: Attribution-weighted feature vector generation. An image is forwarded through CNN, where the corresponding feature vector $F$ and weight matrix $W$ can be extracted to calculate the attribution-weighted feature vector $F_W$.
  • Figure 4: Comparison of representation spaces. (a) Feature representation space computed on original feature vectors. (b) Attribution representation space computed on attribution-weighted feature vectors.
  • Figure 5: Attribution Mosaic generation. We compute data slices after acquiring attribution-weighted feature vectors ($F_W$). Then Feature Inversion is conducted according to $F_W$ and the mosaic boundary of each slice to visualize their common patterns, forming Attribution Mosaic.
  • ...and 7 more figures