Table of Contents
Fetching ...

Bimanual rope manipulation skill synthesis through context dependent correction policy learning from human demonstration

T. Baturhan Akbulut, G. Tuba C. Girgin, Arash Mehrabi, Minoru Asada, Emre Ugur, Erhan Oztop

TL;DR

This work tackles covariate shift in learning from demonstration for long, deformable-object manipulation by decomposing complex tasks into motor primitives and inserting a context-dependent corrective primitive learned from demonstrations. The authors implement a CNMP-based framework where pre-, cor-, and post-movement primitives are conditioned on a task context, enabling targeted corrections at critical points. Simulation and real-robot knotting experiments show that context-conditioned corrections achieve high task success, with AE-derived context performing robustly and outperforming a monolithic baseline. The approach reduces required demonstration data, generalizes across unseen contexts, and demonstrates dexterous rope manipulation in 3D space with real hardware, highlighting practical impact for robust deformable-object manipulation.

Abstract

Learning from demonstration (LfD) provides a convenient means to equip robots with dexterous skills when demonstration can be obtained in robot intrinsic coordinates. However, the problem of compounding errors in long and complex skills reduces its wide deployment. Since most such complex skills are composed of smaller movements that are combined, considering the target skill as a sequence of compact motor primitives seems reasonable. Here the problem that needs to be tackled is to ensure that a motor primitive ends in a state that allows the successful execution of the subsequent primitive. In this study, we focus on this problem by proposing to learn an explicit correction policy when the expected transition state between primitives is not achieved. The correction policy is itself learned via behavior cloning by the use of a state-of-the-art movement primitive learning architecture, Conditional Neural Motor Primitives (CNMPs). The learned correction policy is then able to produce diverse movement trajectories in a context dependent way. The advantage of the proposed system over learning the complete task as a single action is shown with a table-top setup in simulation, where an object has to be pushed through a corridor in two steps. Then, the applicability of the proposed method to bi-manual knotting in the real world is shown by equipping an upper-body humanoid robot with the skill of making knots over a bar in 3D space. The experiments show that the robot can perform successful knotting even when the faced correction cases are not part of the human demonstration set.

Bimanual rope manipulation skill synthesis through context dependent correction policy learning from human demonstration

TL;DR

This work tackles covariate shift in learning from demonstration for long, deformable-object manipulation by decomposing complex tasks into motor primitives and inserting a context-dependent corrective primitive learned from demonstrations. The authors implement a CNMP-based framework where pre-, cor-, and post-movement primitives are conditioned on a task context, enabling targeted corrections at critical points. Simulation and real-robot knotting experiments show that context-conditioned corrections achieve high task success, with AE-derived context performing robustly and outperforming a monolithic baseline. The approach reduces required demonstration data, generalizes across unseen contexts, and demonstrates dexterous rope manipulation in 3D space with real hardware, highlighting practical impact for robust deformable-object manipulation.

Abstract

Learning from demonstration (LfD) provides a convenient means to equip robots with dexterous skills when demonstration can be obtained in robot intrinsic coordinates. However, the problem of compounding errors in long and complex skills reduces its wide deployment. Since most such complex skills are composed of smaller movements that are combined, considering the target skill as a sequence of compact motor primitives seems reasonable. Here the problem that needs to be tackled is to ensure that a motor primitive ends in a state that allows the successful execution of the subsequent primitive. In this study, we focus on this problem by proposing to learn an explicit correction policy when the expected transition state between primitives is not achieved. The correction policy is itself learned via behavior cloning by the use of a state-of-the-art movement primitive learning architecture, Conditional Neural Motor Primitives (CNMPs). The learned correction policy is then able to produce diverse movement trajectories in a context dependent way. The advantage of the proposed system over learning the complete task as a single action is shown with a table-top setup in simulation, where an object has to be pushed through a corridor in two steps. Then, the applicability of the proposed method to bi-manual knotting in the real world is shown by equipping an upper-body humanoid robot with the skill of making knots over a bar in 3D space. The experiments show that the robot can perform successful knotting even when the faced correction cases are not part of the human demonstration set.
Paper Structure (13 sections, 9 figures, 3 tables)

This paper contains 13 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Bi-manual knotting with Torobo in the real world.
  • Figure 2: Illustration of the proposed concept with a single critical point. Demonstrated trajectory (dashed gray curve) passes through a critical region C, which depends on the task. For the robot execution, a correction movement is inserted to counteract the possible case of execution missing C, thereby splitting the execution into three motor primitives (MP): pre-MP, cor-MP, and post-MP. post-MP must be learned context-dependent to recover a wide range of failure cases. This is achieved by obtaining multiple correction demonstrations through teleoperation.
  • Figure 3: Conditional Neural Movement Primitive architecture (CNMP) with a context input is shown. In the illustration, the context is provided by an autoencoder. See Section \ref{['methods']} for more details.
  • Figure 4: Simulation environment prepared in Gazebo.
  • Figure 5: A typical action execution to accomplish the goal.
  • ...and 4 more figures