Table of Contents
Fetching ...

Demonstration Based Explainable AI for Learning from Demonstration Methods

Morris Gu, Elizabeth Croft, Dana Kulic

TL;DR

This work addresses the interpretability gap in Learning from Demonstration by introducing an adaptive explanatory feedback system that generates and selectively presents trajectory demonstrations from the learned policy via MaxEntIRL. The method categorizes explanatory trajectories as successful or unsuccessful based on termination states in $T_u$ and maintains a balanced sampling ratio to visualize both outcomes, enabling users to form a more accurate mental model. A two-condition user study (EF vs NF) with 26 participants on a grid-world navigation task shows that explanatory feedback improves robot performance and teaching efficiency, and enhances user understanding as reflected in prediction accuracy, though perception-related metrics show limited change in a between-subjects design. The findings suggest that XAI-driven explanatory feedback can meaningfully augment human-to-robot teaching, with potential for generalization to broader LfD systems and tasks, and point to future work implementing more complex tasks and disambiguating local versus global policy understanding.

Abstract

Learning from Demonstration (LfD) is a powerful type of machine learning that can allow novices to teach and program robots to complete various tasks. However, the learning process for these systems may still be difficult for novices to interpret and understand, making effective teaching challenging. Explainable artificial intelligence (XAI) aims to address this challenge by explaining a system to the user. In this work, we investigate XAI within LfD by implementing an adaptive explanatory feedback system on an inverse reinforcement learning (IRL) algorithm. The feedback is implemented by demonstrating selected learnt trajectories to users. The system adapts to user teaching by categorizing and then selectively sampling trajectories shown to a user, to show a representative sample of both successful and unsuccessful trajectories. The system was evaluated through a user study with 26 participants teaching a robot a navigation task. The results of the user study demonstrated that the proposed explanatory feedback system can improve robot performance, teaching efficiency and user understanding of the robot.

Demonstration Based Explainable AI for Learning from Demonstration Methods

TL;DR

This work addresses the interpretability gap in Learning from Demonstration by introducing an adaptive explanatory feedback system that generates and selectively presents trajectory demonstrations from the learned policy via MaxEntIRL. The method categorizes explanatory trajectories as successful or unsuccessful based on termination states in and maintains a balanced sampling ratio to visualize both outcomes, enabling users to form a more accurate mental model. A two-condition user study (EF vs NF) with 26 participants on a grid-world navigation task shows that explanatory feedback improves robot performance and teaching efficiency, and enhances user understanding as reflected in prediction accuracy, though perception-related metrics show limited change in a between-subjects design. The findings suggest that XAI-driven explanatory feedback can meaningfully augment human-to-robot teaching, with potential for generalization to broader LfD systems and tasks, and point to future work implementing more complex tasks and disambiguating local versus global policy understanding.

Abstract

Learning from Demonstration (LfD) is a powerful type of machine learning that can allow novices to teach and program robots to complete various tasks. However, the learning process for these systems may still be difficult for novices to interpret and understand, making effective teaching challenging. Explainable artificial intelligence (XAI) aims to address this challenge by explaining a system to the user. In this work, we investigate XAI within LfD by implementing an adaptive explanatory feedback system on an inverse reinforcement learning (IRL) algorithm. The feedback is implemented by demonstrating selected learnt trajectories to users. The system adapts to user teaching by categorizing and then selectively sampling trajectories shown to a user, to show a representative sample of both successful and unsuccessful trajectories. The system was evaluated through a user study with 26 participants teaching a robot a navigation task. The results of the user study demonstrated that the proposed explanatory feedback system can improve robot performance, teaching efficiency and user understanding of the robot.
Paper Structure (17 sections, 9 figures, 2 tables)

This paper contains 17 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: High-level overview of a learning from demonstration system that incorporates policy explanatory feedback. The explanatory feedback module is designed to improve human-to-robot teaching through greater user understanding of learnt robot behaviour
  • Figure 2: A flow diagram for generating and visualizing explanatory trajectories including the specific process for generating and categorizing trajectories (GET), highlighted in the green box to the right.
  • Figure 3: An example of the grid world environment employed in the user studies. Example demonstrations are visualized as red and blue arrows, originating from the top and bottom of the grid world respectively. In the diagram, the blue goal, indicated by the pentagon, is considered a user-demonstrated goal as the terminating state of both demonstrations is this goal. The green goal, indicated by a star, is not a user-demonstrated goal, as no demonstrations reach this goal.
  • Figure 4: A flow diagram for the user study which indicates the order of user study sub-tasks within the teaching navigation task.
  • Figure 5: A graph plotting each user's performance against the number of demonstrations. explanatory feedback (EF) condition participants are visualized in blue; no feedback (NF) condition participants are visualized in orange. Crosses and Plusses indicate when each user ended/completed the user study task and the shaded regions indicate the range of values for each condition.
  • ...and 4 more figures