Table of Contents
Fetching ...

A Dataset for Crucial Object Recognition in Blind and Low-Vision Individuals' Navigation

Md Touhidul Islam, Imran Kabir, Elena Ariel Pearce, Md Alimoor Reza, Syed Masum Billah

TL;DR

This work tackles the gap in BLV navigation by constructing a dataset with 21 navigational videos and a refined 90-object taxonomy, validated through a focus-group study. It documents ground-truth labeling across 31 video segments and analyzes coverage relative to mainstream datasets, revealing substantial gaps for BLV-relevant objects. Preliminary evaluations of seven vision-language models show limited ability to detect key accessibility-related objects, underscoring the need for accessibility-aware training data and potential few-shot adaptation of foundation models. By releasing the dataset publicly, the authors aim to enable retraining and development of more robust, proactive navigation aids for blind and low-vision individuals.

Abstract

This paper introduces a dataset for improving real-time object recognition systems to aid blind and low-vision (BLV) individuals in navigation tasks. The dataset comprises 21 videos of BLV individuals navigating outdoor spaces, and a taxonomy of 90 objects crucial for BLV navigation, refined through a focus group study. We also provide object labeling for the 90 objects across 31 video segments created from the 21 videos. A deeper analysis reveals that most contemporary datasets used in training computer vision models contain only a small subset of the taxonomy in our dataset. Preliminary evaluation of state-of-the-art computer vision models on our dataset highlights shortcomings in accurately detecting key objects relevant to BLV navigation, emphasizing the need for specialized datasets. We make our dataset publicly available, offering valuable resources for developing more inclusive navigation systems for BLV individuals.

A Dataset for Crucial Object Recognition in Blind and Low-Vision Individuals' Navigation

TL;DR

This work tackles the gap in BLV navigation by constructing a dataset with 21 navigational videos and a refined 90-object taxonomy, validated through a focus-group study. It documents ground-truth labeling across 31 video segments and analyzes coverage relative to mainstream datasets, revealing substantial gaps for BLV-relevant objects. Preliminary evaluations of seven vision-language models show limited ability to detect key accessibility-related objects, underscoring the need for accessibility-aware training data and potential few-shot adaptation of foundation models. By releasing the dataset publicly, the authors aim to enable retraining and development of more robust, proactive navigation aids for blind and low-vision individuals.

Abstract

This paper introduces a dataset for improving real-time object recognition systems to aid blind and low-vision (BLV) individuals in navigation tasks. The dataset comprises 21 videos of BLV individuals navigating outdoor spaces, and a taxonomy of 90 objects crucial for BLV navigation, refined through a focus group study. We also provide object labeling for the 90 objects across 31 video segments created from the 21 videos. A deeper analysis reveals that most contemporary datasets used in training computer vision models contain only a small subset of the taxonomy in our dataset. Preliminary evaluation of state-of-the-art computer vision models on our dataset highlights shortcomings in accurately detecting key objects relevant to BLV navigation, emphasizing the need for specialized datasets. We make our dataset publicly available, offering valuable resources for developing more inclusive navigation systems for BLV individuals.
Paper Structure (41 sections, 1 equation, 4 figures, 5 tables)

This paper contains 41 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Bar chart representing the object distribution in our annotated data. Each bar represents the number of keyframes in which an object (as labeled on the x-axis) was present. The X-axis also shows the id of the parent concept or group (as described in Table \ref{['table:taxonomy']}) to which each object belongs. The Y-axis is in logarithmic scale.
  • Figure 2: A heatmap representing the existence of different objects of our list $L_u$ in prominent datasets.
  • Figure 3: A heatmap representing the classwise F1 score of all the selected models (shown in Table \ref{['table:model_type']}).
  • Figure 4: A heatmap representing the classwise F1 score of all the selected models for the objects of groups 3, 5, and 7 (shown in Table \ref{['table:taxonomy']}).