Table of Contents
Fetching ...

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

Md Touhidul Islam, Imran Kabir, Elena Ariel Pearce, Md Alimoor Reza, Syed Masum Billah

TL;DR

This work tackles the lack of accessibility-aware annotations for BLV navigation by building a 90-object taxonomy (final set $L_u$) from 21 publicly available BLV navigation videos and refining it through a six-person focus group. It provides detailed object labeling across 31 video segments and demonstrates that major datasets largely miss many items critical for BLV safety and guidance, revealing gaps for current AI tools. The study shows that open-vocabulary and VQA-capable models (RAM, BLIP, GPV-1) outperform traditional detection/segmentation models but still struggle with key object groups, underscoring the need for accessibility-focused data and user-centric design. By releasing the object list, the annotated videos, and labeling, the authors lay groundwork for more inclusive navigation aids and highlight directions for dataset development and evaluation in assistive vision. The work emphasizes balancing AI assistance with physical aids and user customization to ensure reliable, context-aware navigation support for BLV individuals.

Abstract

This paper presents a curated list of 90 objects essential for the navigation of blind and low-vision (BLV) individuals, encompassing road, sidewalk, and indoor environments. We develop the initial list by analyzing 21 publicly available videos featuring BLV individuals navigating various settings. Then, we refine the list through feedback from a focus group study involving blind, low-vision, and sighted companions of BLV individuals. A subsequent analysis reveals that most contemporary datasets used to train recent computer vision models contain only a small subset of the objects in our proposed list. Furthermore, we provide detailed object labeling for these 90 objects across 31 video segments derived from the original 21 videos. Finally, we make the object list, the 21 videos, and object labeling in the 31 video segments publicly available. This paper aims to fill the existing gap and foster the development of more inclusive and effective navigation aids for the BLV community.

Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation

TL;DR

This work tackles the lack of accessibility-aware annotations for BLV navigation by building a 90-object taxonomy (final set ) from 21 publicly available BLV navigation videos and refining it through a six-person focus group. It provides detailed object labeling across 31 video segments and demonstrates that major datasets largely miss many items critical for BLV safety and guidance, revealing gaps for current AI tools. The study shows that open-vocabulary and VQA-capable models (RAM, BLIP, GPV-1) outperform traditional detection/segmentation models but still struggle with key object groups, underscoring the need for accessibility-focused data and user-centric design. By releasing the object list, the annotated videos, and labeling, the authors lay groundwork for more inclusive navigation aids and highlight directions for dataset development and evaluation in assistive vision. The work emphasizes balancing AI assistance with physical aids and user customization to ensure reliable, context-aware navigation support for BLV individuals.

Abstract

This paper presents a curated list of 90 objects essential for the navigation of blind and low-vision (BLV) individuals, encompassing road, sidewalk, and indoor environments. We develop the initial list by analyzing 21 publicly available videos featuring BLV individuals navigating various settings. Then, we refine the list through feedback from a focus group study involving blind, low-vision, and sighted companions of BLV individuals. A subsequent analysis reveals that most contemporary datasets used to train recent computer vision models contain only a small subset of the objects in our proposed list. Furthermore, we provide detailed object labeling for these 90 objects across 31 video segments derived from the original 21 videos. Finally, we make the object list, the 21 videos, and object labeling in the 31 video segments publicly available. This paper aims to fill the existing gap and foster the development of more inclusive and effective navigation aids for the BLV community.
Paper Structure (18 sections, 1 equation, 4 figures, 4 tables)

This paper contains 18 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: A heatmap representing the existence of different objects of our list in prominent datasets. HTML]0000FF in a cell means the corresponding object exists in the corresponding dataset. In contrast, HTML]D3D3D3 means the object does not exist in the corresponding dataset.
  • Figure 2: Bar chart representing the object distribution in our annotated data. Each bar represents the number of keyframes in which an object (as labeled on the x-axis) was present. The X-axis also shows the id of the parent concept or group (as described in Table \ref{['table:taxonomy']}) to which each object belongs. The Y-axis is in logarithmic scale.
  • Figure 3: A heatmap representing the classwise F1 score of all the selected models (shown in Table \ref{['table:model_type']}).
  • Figure 4: A heatmap representing the classwise F1 score of all the selected models for the objects of groups 3, 5, and 7 (shown in Table \ref{['table:taxonomy']}).