Table of Contents
Fetching ...

Learning Prehensile Dexterity by Imitating and Emulating State-only Observations

Yunhai Han, Zhenyang Chen, Kyle A Williams, Harish Ravichandar

TL;DR

This work tackles learning dexterous prehensile manipulation from state-only observations without action labels or task-specific rewards. It introduces CIMER, a two-stage approach that first builds a Motion Generation Policy $\Phi$ via a Koopman-based lifted dynamical system to provide a motion prior, then uses a Motion Refinement Policy $\Psi$ trained with RL to reenact the object motion through emulation, aided by a PD controller. Across three challenging tasks, CIMER delivers superior sample efficiency, realistic and stable motions, and strong zero-shot generalization to 17 novel objects from the YCB dataset, often outperforming action-label expert policies. The results suggest that decoupling motion generation from refinement and focusing on object-motion emulation yields robust, intervention-free dexterous manipulation capabilities with potential for real-world deployment.

Abstract

When human acquire physical skills (e.g., tennis) from experts, we tend to first learn from merely observing the expert. But this is often insufficient. We then engage in practice, where we try to emulate the expert and ensure that our actions produce similar effects on our environment. Inspired by this observation, we introduce Combining IMitation and Emulation for Motion Refinement (CIMER) -- a two-stage framework to learn dexterous prehensile manipulation skills from state-only observations. CIMER's first stage involves imitation: simultaneously encode the complex interdependent motions of the robot hand and the object in a structured dynamical system. This results in a reactive motion generation policy that provides a reasonable motion prior, but lacks the ability to reason about contact effects due to the lack of action labels. The second stage involves emulation: learn a motion refinement policy via reinforcement that adjusts the robot hand's motion prior such that the desired object motion is reenacted. CIMER is both task-agnostic (no task-specific reward design or shaping) and intervention-free (no additional teleoperated or labeled demonstrations). Detailed experiments with prehensile dexterity reveal that i) imitation alone is insufficient, but adding emulation drastically improves performance, ii) CIMER outperforms existing methods in terms of sample efficiency and the ability to generate realistic and stable motions, iii) CIMER can either zero-shot generalize or learn to adapt to novel objects from the YCB dataset, even outperforming expert policies trained with action labels in most cases. Source code and videos are available at https://sites.google.com/view/cimer-2024/.

Learning Prehensile Dexterity by Imitating and Emulating State-only Observations

TL;DR

This work tackles learning dexterous prehensile manipulation from state-only observations without action labels or task-specific rewards. It introduces CIMER, a two-stage approach that first builds a Motion Generation Policy via a Koopman-based lifted dynamical system to provide a motion prior, then uses a Motion Refinement Policy trained with RL to reenact the object motion through emulation, aided by a PD controller. Across three challenging tasks, CIMER delivers superior sample efficiency, realistic and stable motions, and strong zero-shot generalization to 17 novel objects from the YCB dataset, often outperforming action-label expert policies. The results suggest that decoupling motion generation from refinement and focusing on object-motion emulation yields robust, intervention-free dexterous manipulation capabilities with potential for real-world deployment.

Abstract

When human acquire physical skills (e.g., tennis) from experts, we tend to first learn from merely observing the expert. But this is often insufficient. We then engage in practice, where we try to emulate the expert and ensure that our actions produce similar effects on our environment. Inspired by this observation, we introduce Combining IMitation and Emulation for Motion Refinement (CIMER) -- a two-stage framework to learn dexterous prehensile manipulation skills from state-only observations. CIMER's first stage involves imitation: simultaneously encode the complex interdependent motions of the robot hand and the object in a structured dynamical system. This results in a reactive motion generation policy that provides a reasonable motion prior, but lacks the ability to reason about contact effects due to the lack of action labels. The second stage involves emulation: learn a motion refinement policy via reinforcement that adjusts the robot hand's motion prior such that the desired object motion is reenacted. CIMER is both task-agnostic (no task-specific reward design or shaping) and intervention-free (no additional teleoperated or labeled demonstrations). Detailed experiments with prehensile dexterity reveal that i) imitation alone is insufficient, but adding emulation drastically improves performance, ii) CIMER outperforms existing methods in terms of sample efficiency and the ability to generate realistic and stable motions, iii) CIMER can either zero-shot generalize or learn to adapt to novel objects from the YCB dataset, even outperforming expert policies trained with action labels in most cases. Source code and videos are available at https://sites.google.com/view/cimer-2024/.
Paper Structure (22 sections, 5 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 5 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: CIMER is a task-agnostic and intervention-free framework to learn dexterous manipulation skills from state-only observations by learning to first generate interdependent desired motions of the robot hand and the object (Imitation), and then refine the generated robot motion in order to reenact the learned object motion (Emulation).
  • Figure 2: CIMER generates a motion prior based on initial conditions and refines it based on context to generate PD targets for the hand.
  • Figure 3: We evaluate on three dexterous prehensile skills from Rajeswaran-RSS-18: Tool Use (left), Object Relocation (center), and Door Opening (right).
  • Figure 4: Emulating the observed object motion is significantly more effective than emulating the observed hand motion.
  • Figure 5: Intuitive refinements emerge from CIMER's emulation. Top: Hand ensures hammer hits closer to nail's center, and applies larger driving force; Bottom Left: Fingers exert more force to ensure firmer grasps and stable transport; Bottom Right: Hand rotates faster to boost momentum while turning door handle (enclosed by dotted lines).
  • ...and 6 more figures