Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation
Md Touhidul Islam, Imran Kabir, Elena Ariel Pearce, Md Alimoor Reza, Syed Masum Billah
TL;DR
This work tackles the lack of accessibility-aware annotations for BLV navigation by building a 90-object taxonomy (final set $L_u$) from 21 publicly available BLV navigation videos and refining it through a six-person focus group. It provides detailed object labeling across 31 video segments and demonstrates that major datasets largely miss many items critical for BLV safety and guidance, revealing gaps for current AI tools. The study shows that open-vocabulary and VQA-capable models (RAM, BLIP, GPV-1) outperform traditional detection/segmentation models but still struggle with key object groups, underscoring the need for accessibility-focused data and user-centric design. By releasing the object list, the annotated videos, and labeling, the authors lay groundwork for more inclusive navigation aids and highlight directions for dataset development and evaluation in assistive vision. The work emphasizes balancing AI assistance with physical aids and user customization to ensure reliable, context-aware navigation support for BLV individuals.
Abstract
This paper presents a curated list of 90 objects essential for the navigation of blind and low-vision (BLV) individuals, encompassing road, sidewalk, and indoor environments. We develop the initial list by analyzing 21 publicly available videos featuring BLV individuals navigating various settings. Then, we refine the list through feedback from a focus group study involving blind, low-vision, and sighted companions of BLV individuals. A subsequent analysis reveals that most contemporary datasets used to train recent computer vision models contain only a small subset of the objects in our proposed list. Furthermore, we provide detailed object labeling for these 90 objects across 31 video segments derived from the original 21 videos. Finally, we make the object list, the 21 videos, and object labeling in the 31 video segments publicly available. This paper aims to fill the existing gap and foster the development of more inclusive and effective navigation aids for the BLV community.
