Table of Contents
Fetching ...

DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding

Shuijing Liu, Aamir Hasan, Kaiwen Hong, Runxuan Wang, Peixin Chang, Zachary Mizrachi, Justin Lin, D. Livingston McPherson, Wendy A. Rogers, Katherine Driggs-Campbell

TL;DR

DRAGON presents a dialogue-based robot that grounds natural language to environmental landmarks to aid navigation for visually impaired users. It employs a modular NLU-grounding-vision framework, anchored by a fine-tuned CLIP model for open-vocabulary landmark recognition, plus object-detection and VQA modules for environment description. A user study with five participants shows DRAGON can understand intents through dialogue, guide to landmarks, and describe surroundings, with CLIP outperforming a fixed-vocabulary baseline. The work demonstrates the practical viability of vision-language grounding in assistive robotics and points to future enhancements in dialogue adaptability and richer environmental reasoning.

Abstract

Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner. Videos and code are available at https://sites.google.com/view/dragon-wayfinding/home.

DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding

TL;DR

DRAGON presents a dialogue-based robot that grounds natural language to environmental landmarks to aid navigation for visually impaired users. It employs a modular NLU-grounding-vision framework, anchored by a fine-tuned CLIP model for open-vocabulary landmark recognition, plus object-detection and VQA modules for environment description. A user study with five participants shows DRAGON can understand intents through dialogue, guide to landmarks, and describe surroundings, with CLIP outperforming a fixed-vocabulary baseline. The work demonstrates the practical viability of vision-language grounding in assistive robotics and points to future enhancements in dialogue adaptability and richer environmental reasoning.

Abstract

Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner. Videos and code are available at https://sites.google.com/view/dragon-wayfinding/home.
Paper Structure (21 sections, 4 figures, 6 tables)

This paper contains 21 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: DRAGON identifies the intents of the user through dialogue, grounds language with the environment, and guides the user to their desired goal.
  • Figure 2: An overview of the system and platform of DRAGON. (a) Submodules, message passing, and user interface. (b) The robot platform.
  • Figure 3: The map of our environment with semantic landmarks. The images are landmarks with locations marked by red dots. The orange lines are the three routes in the user study. The red squares are the starting locations of routes.
  • Figure 4: An example navigation trial with human-robot dialogue in the user study. In the dialogue boxes, "H" denotes the human and "R" denotes the robot. The camera view is shown in the lower right corner.