Table of Contents
Fetching ...

Addressing Ambiguity in Imitation Learning through Product of Experts based Negative Feedback

John Bateman, Andy M. Tyrrell, Jihong Zhu

Abstract

Programming robots to perform complex tasks is often difficult and time consuming, requiring expert knowledge and skills in robot software and sometimes hardware. Imitation learning is a method for training robots to perform tasks by leveraging human expertise through demonstrations. Typically, the assumption is that those demonstrations are performed by a single, highly competent expert. However, in many real-world applications that use user demonstrations for tasks or incorporate both user data and pretrained data, such as home robotics including assistive robots, this is unlikely to be the case. This paper presents research towards a system which can leverage suboptimal demonstrations to solve ambiguous tasks; and particularly learn from its own failures. This is a negative-feedback system which achieves significant improvement over purely positive imitation learning for ambiguous tasks, achieving a 90% improvement in success rate against a system that does not utilise negative feedback, compared to a 50% improvement in success rate when utilised on a real robot, as well as demonstrating higher efficacy, memory efficiency and time efficiency than a comparable negative feedback scheme. The novel scheme presented in this paper is validated through simulated and real-robot experiments.

Addressing Ambiguity in Imitation Learning through Product of Experts based Negative Feedback

Abstract

Programming robots to perform complex tasks is often difficult and time consuming, requiring expert knowledge and skills in robot software and sometimes hardware. Imitation learning is a method for training robots to perform tasks by leveraging human expertise through demonstrations. Typically, the assumption is that those demonstrations are performed by a single, highly competent expert. However, in many real-world applications that use user demonstrations for tasks or incorporate both user data and pretrained data, such as home robotics including assistive robots, this is unlikely to be the case. This paper presents research towards a system which can leverage suboptimal demonstrations to solve ambiguous tasks; and particularly learn from its own failures. This is a negative-feedback system which achieves significant improvement over purely positive imitation learning for ambiguous tasks, achieving a 90% improvement in success rate against a system that does not utilise negative feedback, compared to a 50% improvement in success rate when utilised on a real robot, as well as demonstrating higher efficacy, memory efficiency and time efficiency than a comparable negative feedback scheme. The novel scheme presented in this paper is validated through simulated and real-robot experiments.

Paper Structure

This paper contains 14 sections, 2 equations, 11 figures, 1 algorithm.

Figures (11)

  • Figure 1: An illustration of the ambiguity problem in a task avoiding an obstacle while navigating to a goal. The two blue lines represent demonstrations, both successes (neither collide with the obstacle), but the central green line, learned from a combination of the two, is a failure, as it averages the two behaviours and collides with the obstacle.
  • Figure 2: An illustration of the mask for an obstacle avoidance task, with zeroes, regions where $N_{threshold}$ or greater trajectories pass through, shown in white and ones, where fewer than $N_{threshold}$ trajectoreis pass through, shown in blue, and the obstacle being avoided shown in red.
  • Figure 3: An illustration of the simple ambiguous task tested here, with the two success modes demonstrated, over (left) and under (right)
  • Figure 4: The Negative Feedback graph for the simple ambiguous task, showing the improvement in success rate with increasing cycles of negative feedback, comparing three different methods for applying negative feedback - a negative weighting method, a mixture of experts method, and finally the product of experts method presented in this paper
  • Figure 5: The graph showing the impact of different methods of trajectory selection on the same five-cycle run of negative feedback described in \ref{['fig:simpleDemos']}. Here, 'Mask 75%' and 'Mask 50%' refer to the impacts of using an $N_{threshold}$ of 75% of the trajectories and 50% of the trajectories respectively.
  • ...and 6 more figures