Table of Contents
Fetching ...

State- and context-dependent robotic manipulation and grasping via uncertainty-aware imitation learning

Tim R. Winter, Ashok M. Sundaram, Werner Friedl, Maximo A. Roa, Freek Stulp, João Silvério

TL;DR

This work builds on existing work on policy fusion with uncertainty quantification to propose a state-dependent approach that automatically returns to demonstrations, avoiding unpredictable behavior while smoothly adapting to context changes.

Abstract

Generating context-adaptive manipulation and grasping actions is a challenging problem in robotics. Classical planning and control algorithms tend to be inflexible with regard to parameterization by external variables such as object shapes. In contrast, Learning from Demonstration (LfD) approaches, due to their nature as function approximators, allow for introducing external variables to modulate policies in response to the environment. In this paper, we utilize this property by introducing an LfD approach to acquire context-dependent grasping and manipulation strategies. We treat the problem as a kernel-based function approximation, where the kernel inputs include generic context variables describing task-dependent parameters such as the object shape. We build on existing work on policy fusion with uncertainty quantification to propose a state-dependent approach that automatically returns to demonstrations, avoiding unpredictable behavior while smoothly adapting to context changes. The approach is evaluated against the LASA handwriting dataset and on a real 7-DoF robot in two scenarios: adaptation to slippage while grasping and manipulating a deformable food item.

State- and context-dependent robotic manipulation and grasping via uncertainty-aware imitation learning

TL;DR

This work builds on existing work on policy fusion with uncertainty quantification to propose a state-dependent approach that automatically returns to demonstrations, avoiding unpredictable behavior while smoothly adapting to context changes.

Abstract

Generating context-adaptive manipulation and grasping actions is a challenging problem in robotics. Classical planning and control algorithms tend to be inflexible with regard to parameterization by external variables such as object shapes. In contrast, Learning from Demonstration (LfD) approaches, due to their nature as function approximators, allow for introducing external variables to modulate policies in response to the environment. In this paper, we utilize this property by introducing an LfD approach to acquire context-dependent grasping and manipulation strategies. We treat the problem as a kernel-based function approximation, where the kernel inputs include generic context variables describing task-dependent parameters such as the object shape. We build on existing work on policy fusion with uncertainty quantification to propose a state-dependent approach that automatically returns to demonstrations, avoiding unpredictable behavior while smoothly adapting to context changes. The approach is evaluated against the LASA handwriting dataset and on a real 7-DoF robot in two scenarios: adaptation to slippage while grasping and manipulating a deformable food item.

Paper Structure

This paper contains 18 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Manipulating deformable objects, such as food items, often requires different strategies depending on the variable object shape and grasp configuration. In the above images we see how placing a deformable piece of silicone fish on a tray requires different approach and place strategies depending on how it has been grasped. Instead of accurately modeling the different possibilities, we propose to learn them from human demonstrations.
  • Figure 2: Two-dimensional example showing the results of the underlying individual policies and the mixture of these represented as vector fields when trained on multiple demonstrations of the handwritten letter ’Z’. (a) shows the mean of the LfD policy, (b) illustrates the mean of the stabilizing policy, (c) displays the mean of the goal attractor policy and (d) demonstrates the combination of the individual policies using MoE.
  • Figure 3: Two-dimensional example, including context, showing the epistemic uncertainties and the vector fields of our approach for three different context values each corresponding to one of the demonstrated handwritten letters. Here (a) shows three clusters of context variables used for training, distinguished by a color code, each consisting of a two-dimensional pair of values ($c_1$, $c_2$). The subfigures (b), (c), (d) show the vector fields and epistemic uncertainties for inputs c = [0 0], c = [1 1], c = [2 2], respectively.
  • Figure 4: Setup and results of the re-grasp experiments: (a) experimental setup, (b) time evolution of the end-effector z-value in response to losing the object, (c) reproduced trajectory associated with $c = 0$, uncertainty and vector field displayed in the y-z plane at $t_1$, (d) reproduced trajectory associated with $c = 1$, uncertainty and vector field displayed in the y-z plane at $t_2$.
  • Figure 5: Results of the fish placing experiment. The figure shows a time sequence of the reproduced trajectory, the demonstrations, the epistemic uncertainty and the vector fields for the cases where the fish hangs on the left and right. Furthermore, in the bottom-right corner of the individual plots, the current configuration of the fish at the respective time and the segmented points, from which we derive the context values, can be seen.