Table of Contents
Fetching ...

Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes

Hayeong Song, Gonzalo Ramos, Peter Bodik

TL;DR

The paper tackles the difficulty of building and debugging computer vision models by introducing two interactive visualizations—timeline and scatterplot—within the Sprite system to support planning and evaluation tasks. Through a between-subject usability study, it demonstrates that users employing the visualizations identify a more diverse set of model-error examples and experience higher usability with lower cognitive load compared to a query-language baseline. The findings suggest that visual analytics can empower subject-domain experts to efficiently sample informative frames and guide labeling to improve CV performance, particularly for temporally correlated video data. The work also discusses generalization of these design principles to other time-series data and the potential to incorporate helper models for sampling guidance. Overall, the approach advances interactive machine teaching in CV by enhancing model diagnosis and data-driven improvement workflows.

Abstract

Creating Computer Vision (CV) models remains a complex practice, despite their ubiquity. Access to data, the requirement for ML expertise, and model opacity are just a few points of complexity that limit the ability of end-users to build, inspect, and improve these models. Interactive ML perspectives have helped address some of these issues by considering a teacher in the loop where planning, teaching, and evaluating tasks take place. We present and evaluate two interactive visualizations in the context of Sprite, a system for creating CV classification and detection models for images originating from videos. We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance, compared to a baseline condition where users used a query language. We found that users who had used the visualizations found more images across a wider set of potential types of model errors.

Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes

TL;DR

The paper tackles the difficulty of building and debugging computer vision models by introducing two interactive visualizations—timeline and scatterplot—within the Sprite system to support planning and evaluation tasks. Through a between-subject usability study, it demonstrates that users employing the visualizations identify a more diverse set of model-error examples and experience higher usability with lower cognitive load compared to a query-language baseline. The findings suggest that visual analytics can empower subject-domain experts to efficiently sample informative frames and guide labeling to improve CV performance, particularly for temporally correlated video data. The work also discusses generalization of these design principles to other time-series data and the potential to incorporate helper models for sampling guidance. Overall, the approach advances interactive machine teaching in CV by enhancing model diagnosis and data-driven improvement workflows.

Abstract

Creating Computer Vision (CV) models remains a complex practice, despite their ubiquity. Access to data, the requirement for ML expertise, and model opacity are just a few points of complexity that limit the ability of end-users to build, inspect, and improve these models. Interactive ML perspectives have helped address some of these issues by considering a teacher in the loop where planning, teaching, and evaluating tasks take place. We present and evaluate two interactive visualizations in the context of Sprite, a system for creating CV classification and detection models for images originating from videos. We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance, compared to a baseline condition where users used a query language. We found that users who had used the visualizations found more images across a wider set of potential types of model errors.
Paper Structure (11 sections, 2 figures, 1 table)

This paper contains 11 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: General UI of a Sprite system. A-The system's menu shows where people can ingest (upload) videos into the system. B-Selector for the current model being worked/trained on. C-Main sampling query area to retrieve from the set of decoded images. D-Labeling selector that indicates the current classes of a selected classification model, and that is used to indicate what label to assign to a selected image. E-Grid of retrieved images from a query. F-Control to initiate a training operation and check prediction accuracy.
  • Figure 2: A - Scatterplot view used to inspect the predictions of two CV models: workerSize (classification) and worker (detection). The X-axis displays worker size classifier prediction results. The Y-axis displays prediction results for a worker detection model. B - The selected data point (the red circle) represents an image where the two models seem to agree. C - The image corresponding to the selected data point is shown along with a prediction label and score for the model currently being evaluated. To troubleshoot a classification model, a user can check data points corresponding to borderline prediction scores (pink area, e.g. 0.4 ${<}$ score ${<}$ 0.7). Then the user can check samples where a worker was detected (high worker detection scores) and borderline scores for a classification model, as the models seem to struggle (i.e., noisy background). This type of visualization can be helpful to quickly retrieve sub-samples of images that can help improve a model, by assisting users to compare prediction results across semantically related CV models.