Understanding How Blind Users Handle Object Recognition Errors: Strategies and Challenges
Jonggi Hong, Hernisa Kacorri
TL;DR
This work investigates how blind and low-vision users handle errors in camera-based object recognition. Using URCam, a fine-tuned teachable recognizer, the study combines remote interviews and a hands-on error-identification task with 12 participants to reveal user strategies, confidence, and time when dealing with misrecognitions. Key findings show participants independently review photo quality, frequently rely on contextual cues, and identify roughly half of the errors, with error detection not reliably improved by repetition. The results offer design implications for accessible interfaces that support error awareness and mitigation in object recognition systems, aiming to close the gap between benchmark accuracy and real-world usability.
Abstract
Object recognition technologies hold the potential to support blind and low-vision people in navigating the world around them. However, the gap between benchmark performances and practical usability remains a significant challenge. This paper presents a study aimed at understanding blind users' interaction with object recognition systems for identifying and avoiding errors. Leveraging a pre-existing object recognition system, URCam, fine-tuned for our experiment, we conducted a user study involving 12 blind and low-vision participants. Through in-depth interviews and hands-on error identification tasks, we gained insights into users' experiences, challenges, and strategies for identifying errors in camera-based assistive technologies and object recognition systems. During interviews, many participants preferred independent error review, while expressing apprehension toward misrecognitions. In the error identification task, participants varied viewpoints, backgrounds, and object sizes in their images to avoid and overcome errors. Even after repeating the task, participants identified only half of the errors, and the proportion of errors identified did not significantly differ from their first attempts. Based on these insights, we offer implications for designing accessible interfaces tailored to the needs of blind and low-vision users in identifying object recognition errors.
