Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

Md Touhidul Islam; Imran Kabir; Elena Ariel Pearce; Md Alimoor Reza; Syed Masum Billah

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

Md Touhidul Islam, Imran Kabir, Elena Ariel Pearce, Md Alimoor Reza, Syed Masum Billah

TL;DR

This work tackles the lack of accessibility-aware annotations for BLV navigation by building a 90-object taxonomy (final set $L_u$) from 21 publicly available BLV navigation videos and refining it through a six-person focus group. It provides detailed object labeling across 31 video segments and demonstrates that major datasets largely miss many items critical for BLV safety and guidance, revealing gaps for current AI tools. The study shows that open-vocabulary and VQA-capable models (RAM, BLIP, GPV-1) outperform traditional detection/segmentation models but still struggle with key object groups, underscoring the need for accessibility-focused data and user-centric design. By releasing the object list, the annotated videos, and labeling, the authors lay groundwork for more inclusive navigation aids and highlight directions for dataset development and evaluation in assistive vision. The work emphasizes balancing AI assistance with physical aids and user customization to ensure reliable, context-aware navigation support for BLV individuals.

Abstract

This paper presents a curated list of 90 objects essential for the navigation of blind and low-vision (BLV) individuals, encompassing road, sidewalk, and indoor environments. We develop the initial list by analyzing 21 publicly available videos featuring BLV individuals navigating various settings. Then, we refine the list through feedback from a focus group study involving blind, low-vision, and sighted companions of BLV individuals. A subsequent analysis reveals that most contemporary datasets used to train recent computer vision models contain only a small subset of the objects in our proposed list. Furthermore, we provide detailed object labeling for these 90 objects across 31 video segments derived from the original 21 videos. Finally, we make the object list, the 21 videos, and object labeling in the 31 video segments publicly available. This paper aims to fill the existing gap and foster the development of more inclusive and effective navigation aids for the BLV community.

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

TL;DR

This work tackles the lack of accessibility-aware annotations for BLV navigation by building a 90-object taxonomy (final set

) from 21 publicly available BLV navigation videos and refining it through a six-person focus group. It provides detailed object labeling across 31 video segments and demonstrates that major datasets largely miss many items critical for BLV safety and guidance, revealing gaps for current AI tools. The study shows that open-vocabulary and VQA-capable models (RAM, BLIP, GPV-1) outperform traditional detection/segmentation models but still struggle with key object groups, underscoring the need for accessibility-focused data and user-centric design. By releasing the object list, the annotated videos, and labeling, the authors lay groundwork for more inclusive navigation aids and highlight directions for dataset development and evaluation in assistive vision. The work emphasizes balancing AI assistance with physical aids and user customization to ensure reliable, context-aware navigation support for BLV individuals.

Abstract

Paper Structure (18 sections, 1 equation, 4 figures, 4 tables)

This paper contains 18 sections, 1 equation, 4 figures, 4 tables.

Motivation
Background and Related Work
Identification of Objects with Accessibility Impact
Revising Objects with Accessibility Impact: A Focus Group Study
Procedure
Feedback on the Initial List
Revised Taxonomy and Design Implications
Analyzing the Object List: Coverage in Prominent Datasets
Object Labeling
Appendix: Video Analysis and Object Labeling Details
Video Analysis
Object Labeling
Analyzing Annotated Data
Appendix: Preliminary Evaluations Using Our Object Labeling
Model Selection
...and 3 more sections

Figures (4)

Figure 1: A heatmap representing the existence of different objects of our list in prominent datasets. HTML]0000FF in a cell means the corresponding object exists in the corresponding dataset. In contrast, HTML]D3D3D3 means the object does not exist in the corresponding dataset.
Figure 2: Bar chart representing the object distribution in our annotated data. Each bar represents the number of keyframes in which an object (as labeled on the x-axis) was present. The X-axis also shows the id of the parent concept or group (as described in Table \ref{['table:taxonomy']}) to which each object belongs. The Y-axis is in logarithmic scale.
Figure 3: A heatmap representing the classwise F1 score of all the selected models (shown in Table \ref{['table:model_type']}).
Figure 4: A heatmap representing the classwise F1 score of all the selected models for the objects of groups 3, 5, and 7 (shown in Table \ref{['table:taxonomy']}).

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

TL;DR

Abstract

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)