Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes
Hayeong Song, Gonzalo Ramos, Peter Bodik
TL;DR
The paper tackles the difficulty of building and debugging computer vision models by introducing two interactive visualizations—timeline and scatterplot—within the Sprite system to support planning and evaluation tasks. Through a between-subject usability study, it demonstrates that users employing the visualizations identify a more diverse set of model-error examples and experience higher usability with lower cognitive load compared to a query-language baseline. The findings suggest that visual analytics can empower subject-domain experts to efficiently sample informative frames and guide labeling to improve CV performance, particularly for temporally correlated video data. The work also discusses generalization of these design principles to other time-series data and the potential to incorporate helper models for sampling guidance. Overall, the approach advances interactive machine teaching in CV by enhancing model diagnosis and data-driven improvement workflows.
Abstract
Creating Computer Vision (CV) models remains a complex practice, despite their ubiquity. Access to data, the requirement for ML expertise, and model opacity are just a few points of complexity that limit the ability of end-users to build, inspect, and improve these models. Interactive ML perspectives have helped address some of these issues by considering a teacher in the loop where planning, teaching, and evaluating tasks take place. We present and evaluate two interactive visualizations in the context of Sprite, a system for creating CV classification and detection models for images originating from videos. We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance, compared to a baseline condition where users used a query language. We found that users who had used the visualizations found more images across a wider set of potential types of model errors.
