Table of Contents
Fetching ...

Bidirectional Human Interactive AI Framework for Social Robot Navigation

Tuba Girgin, Emre Girgin, Yigit Yildirim, Emre Ugur, Mehmet Haklidir

TL;DR

The paper addresses the need for trustworthy, explainable bidirectional human-robot interaction in social navigation for autonomous mobile robots. It proposes an end-to-end framework that integrates RGB-LIDAR perception, a trajectory forecasting backbone based on LSTM encodings to a Graph Attention Network, and a trustworthy AI module that verbalizes decisions while incorporating human input through hand gestures. Key contributions include a novel GNN-based trajectory predictor, a bidirectional audio-visual interaction workflow, and a demonstration in an office environment with a TurtleBot 3, highlighting social-aware planning and conflict resolution. The framework aims to improve safety, transparency, and user comfort in human-centric industrial settings by providing explanations and accepting human guidance online. Future work includes full system integration, dataset collection in smart factories, and user surveys to assess the effectiveness of vocal explanations and gesture-based directives.

Abstract

Trustworthiness is a crucial concept in the context of human-robot interaction. Cooperative robots must be transparent regarding their decision-making process, especially when operating in a human-oriented environment. This paper presents a comprehensive end-to-end framework aimed at fostering trustworthy bidirectional human-robot interaction in collaborative environments for the social navigation of mobile robots. In this framework, the robot communicates verbally while the human guides with gestures. Our method enables a mobile robot to predict the trajectory of people and adjust its route in a socially-aware manner. In case of conflict between human and robot decisions, detected through visual examination, the route is dynamically modified based on human preference while verbal communication is maintained. We present our pipeline, framework design, and preliminary experiments that form the foundation of our proposition.

Bidirectional Human Interactive AI Framework for Social Robot Navigation

TL;DR

The paper addresses the need for trustworthy, explainable bidirectional human-robot interaction in social navigation for autonomous mobile robots. It proposes an end-to-end framework that integrates RGB-LIDAR perception, a trajectory forecasting backbone based on LSTM encodings to a Graph Attention Network, and a trustworthy AI module that verbalizes decisions while incorporating human input through hand gestures. Key contributions include a novel GNN-based trajectory predictor, a bidirectional audio-visual interaction workflow, and a demonstration in an office environment with a TurtleBot 3, highlighting social-aware planning and conflict resolution. The framework aims to improve safety, transparency, and user comfort in human-centric industrial settings by providing explanations and accepting human guidance online. Future work includes full system integration, dataset collection in smart factories, and user surveys to assess the effectiveness of vocal explanations and gesture-based directives.

Abstract

Trustworthiness is a crucial concept in the context of human-robot interaction. Cooperative robots must be transparent regarding their decision-making process, especially when operating in a human-oriented environment. This paper presents a comprehensive end-to-end framework aimed at fostering trustworthy bidirectional human-robot interaction in collaborative environments for the social navigation of mobile robots. In this framework, the robot communicates verbally while the human guides with gestures. Our method enables a mobile robot to predict the trajectory of people and adjust its route in a socially-aware manner. In case of conflict between human and robot decisions, detected through visual examination, the route is dynamically modified based on human preference while verbal communication is maintained. We present our pipeline, framework design, and preliminary experiments that form the foundation of our proposition.
Paper Structure (7 sections, 6 figures, 1 table)

This paper contains 7 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: An example trustworthy social navigation scenario is illustrated. First, the robot plans a path (red) avoiding future collusion with the person (gray). The robot verbally clarifies its actions to the non-expert to establish trust. Subsequently, the non-expert directs the robot to a new trajectory using hand gestures. The robot, then, plans a new path passing on the desired side (green). The robot verbally explains itself again while routing on the new path.
  • Figure 2: A potential scenario of social robot navigation is depicted. The robot adjusts its original path, taking into account the predicted trajectory of an encountered human. However, the robot initiates movement toward the current position of the human, which may induce fear and erode trust. In the second figure, the proposed system is illustrated. To address the discomfort caused by the robot's unpredictable movements, the proposed system explains the trajectory, expecting guidance in return.
  • Figure 3: An example of a human detection and localization pipeline in our office environment. The point cloud (colored by distance) is represented on the image plane by weak perspective projection and used for 3D localization by fusing with an instance segmentation algorithm. The 3D estimation locations are converted to trajectories by optical flow between consecutive frames.
  • Figure 4: Proposed GAT-based trajectory estimation framework. Human trajectories are encoded with a pretrained LSTM encoder. A dense graph is formed by encodings to track the relation and context between human trajectories using GAT layers. Also, an occupancy map is utilized to static obstacles in the scene.
  • Figure 5: Self-supervised trajectory encoder output on Trajnet++ dataset. An encoder&decoder architecture is trained to use the encoder part for GAT input. The architecture successfully recovers the encoded trajectories from the test set.
  • ...and 1 more figures