Table of Contents
Fetching ...

Immersive Explainability: Visualizing Robot Navigation Decisions through XAI Semantic Scene Projections in Virtual Reality

Jorge de Heuvel, Sebastian Müller, Marlene Wessels, Aftab Akhtar, Christian Bauckhage, Maren Bennewitz

TL;DR

This work addresses the opacity of RL-based robot navigation by introducing an immersive VR interface that ground-truths XAI attributions in semantic scene elements and overlays lidar perception. By mapping gradient-based attributions for the linear velocity output onto objects, the system enables non-experts to intuitively understand which scene components influence navigation decisions. A within-subjects user study demonstrates that semantic XAI projections significantly improve objective understanding and perceived predictability, while lidar visualization enhances plausibility and user awareness. The findings support immersive, scene-grounded explainability as a practical approach to calibrating trust and improving human-robot collaboration in complex environments.

Abstract

End-to-end robot policies achieve high performance through neural networks trained via reinforcement learning (RL). Yet, their black box nature and abstract reasoning pose challenges for human-robot interaction (HRI), because humans may experience difficulty in understanding and predicting the robot's navigation decisions, hindering trust development. We present a virtual reality (VR) interface that visualizes explainable AI (XAI) outputs and the robot's lidar perception to support intuitive interpretation of RL-based navigation behavior. By visually highlighting objects based on their attribution scores, the interface grounds abstract policy explanations in the scene context. This XAI visualization bridges the gap between obscure numerical XAI attribution scores and a human-centric semantic level of explanation. A within-subjects study with 24 participants evaluated the effectiveness of our interface for four visualization conditions combining XAI and lidar. Participants ranked scene objects across navigation scenarios based on their importance to the robot, followed by a questionnaire assessing subjective understanding and predictability. Results show that semantic projection of attributions significantly enhances non-expert users' objective understanding and subjective awareness of robot behavior. In addition, lidar visualization further improves perceived predictability, underscoring the value of integrating XAI and sensor for transparent, trustworthy HRI.

Immersive Explainability: Visualizing Robot Navigation Decisions through XAI Semantic Scene Projections in Virtual Reality

TL;DR

This work addresses the opacity of RL-based robot navigation by introducing an immersive VR interface that ground-truths XAI attributions in semantic scene elements and overlays lidar perception. By mapping gradient-based attributions for the linear velocity output onto objects, the system enables non-experts to intuitively understand which scene components influence navigation decisions. A within-subjects user study demonstrates that semantic XAI projections significantly improve objective understanding and perceived predictability, while lidar visualization enhances plausibility and user awareness. The findings support immersive, scene-grounded explainability as a practical approach to calibrating trust and improving human-robot collaboration in complex environments.

Abstract

End-to-end robot policies achieve high performance through neural networks trained via reinforcement learning (RL). Yet, their black box nature and abstract reasoning pose challenges for human-robot interaction (HRI), because humans may experience difficulty in understanding and predicting the robot's navigation decisions, hindering trust development. We present a virtual reality (VR) interface that visualizes explainable AI (XAI) outputs and the robot's lidar perception to support intuitive interpretation of RL-based navigation behavior. By visually highlighting objects based on their attribution scores, the interface grounds abstract policy explanations in the scene context. This XAI visualization bridges the gap between obscure numerical XAI attribution scores and a human-centric semantic level of explanation. A within-subjects study with 24 participants evaluated the effectiveness of our interface for four visualization conditions combining XAI and lidar. Participants ranked scene objects across navigation scenarios based on their importance to the robot, followed by a questionnaire assessing subjective understanding and predictability. Results show that semantic projection of attributions significantly enhances non-expert users' objective understanding and subjective awareness of robot behavior. In addition, lidar visualization further improves perceived predictability, underscoring the value of integrating XAI and sensor for transparent, trustworthy HRI.

Paper Structure

This paper contains 22 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Our immersive VR explainability interface communicates XAI attributions and sensor perception of an RL robot navigation policy to non-expert users, by grounding them in the object semantics of the scene. Objects that are important to the policy are highlighted using a glowing outline. A better perception understanding in combination with the user's perceived ability predict to the robot can lead to calibrated trust towards the robot.
  • Figure 2: Architecture of our XAI-VR interface. a) The VR interface visualizes the robot in a navigation scenario, the object-projected XAI attribution scores, and the 2D lidar sensor to the user. In our user study, the visualizations (XAI and lidar) represent the independent variables (IVs), while we measure the users' performance in ranking the robot-surrounding objects according to their importance to the robot, defined by the visualized attribution scores. b) Objects are highlighted according to their importance by a white outline of variable thickness, here depicted in a top-down schematic. Their importance is assigned by the ray-casts of the 2D lidar sensor, which project the post-processed attribution scores of the lidar-containing state space into the scene. Specifically, the state space contains a min-pooled set of lidar readings and the robot-centric goal position. c) The XAI technique Vanilla Gradient generates gradient-based attributions $\vec{g}$ for the RL-trained navigation policy. The lidar-related part of $\vec{g}$ is post-processed for visualization in VR using Eq. \ref{['eq:postprocessing']}.
  • Figure 3: a) The distribution of raw lidar attribution scores $\vec{g}$ provided by Vanilla Gradient for all navigation state-action pairs presented during the user study. b) After postprocessing for visualization (Eq. \ref{['eq:postprocessing']}), the distribution of $\vec{g}^*$ shifts into the range $[0,1]$.
  • Figure 4: Example scenes with post-processed XAI attribution scores $\vec{g}^*$ of the linear velocity output, indicated as color-coding for their respective min-pooled lidar ray. The robot (black triangle) is facing to the right, while different obstacles (grey boxes) influence the navigation policy that should pursue the goal (green dot). Depending on the scene setup, the obstacles influence on the policy is varies. Axis ticks denote $1m$ distances.
  • Figure 5: a) Fully-crossed combinations of two independent variables (IVs): XAI scene projection and 2D lidar sensor visualization. They sum up to four experimental conditions, represented by four blocks. Each block was followed by a questionnaire. b) Participants start each trial pressing the A button on the controller. The robot navigated for 3s, halted, and the XAI and/or lidar visualization remained after another second. Afterwards, participants ranked the importance of five scene objects for the robot policy.
  • ...and 2 more figures