Table of Contents
Fetching ...

Manipulate as Human: Learning Task-oriented Manipulation Skills by Adversarial Motion Priors

Ziqi Ma, Changda Tian, Yue Gao

TL;DR

HMAMP addresses the challenge of learning human-like tool manipulation by integrating Adversarial Motion Priors with reinforcement learning to jointly optimize task performance and motion style. The method uses a discriminator-based style reward and a task reward, trained with PPO, and leverages both real human-motion clips and simulation data to shape realistic trajectories. Key contributions include a clear keypoint-based formulation for tools and environment, a hybrid reward structure combining $r_t = \alpha^g r^g_t + \beta^s r^s_t$, and demonstrable improvements on hammering in simulation and real-robot transfer on a Kinova Gen3. The work advances intuitive human–robot interaction by showing that human-like manipulation skills can be learned from accessible video data and transferred to physical hardware with domain randomization.

Abstract

In recent years, there has been growing interest in developing robots and autonomous systems that can interact with human in a more natural and intuitive way. One of the key challenges in achieving this goal is to enable these systems to manipulate objects and tools in a manner that is similar to that of humans. In this paper, we propose a novel approach for learning human-style manipulation skills by using adversarial motion priors, which we name HMAMP. The approach leverages adversarial networks to model the complex dynamics of tool and object manipulation, as well as the aim of the manipulation task. The discriminator is trained using a combination of real-world data and simulation data executed by the agent, which is designed to train a policy that generates realistic motion trajectories that match the statistical properties of human motion. We evaluated HMAMP on one challenging manipulation task: hammering, and the results indicate that HMAMP is capable of learning human-style manipulation skills that outperform current baseline methods. Additionally, we demonstrate that HMAMP has potential for real-world applications by performing real robot arm hammering tasks. In general, HMAMP represents a significant step towards developing robots and autonomous systems that can interact with humans in a more natural and intuitive way, by learning to manipulate tools and objects in a manner similar to how humans do.

Manipulate as Human: Learning Task-oriented Manipulation Skills by Adversarial Motion Priors

TL;DR

HMAMP addresses the challenge of learning human-like tool manipulation by integrating Adversarial Motion Priors with reinforcement learning to jointly optimize task performance and motion style. The method uses a discriminator-based style reward and a task reward, trained with PPO, and leverages both real human-motion clips and simulation data to shape realistic trajectories. Key contributions include a clear keypoint-based formulation for tools and environment, a hybrid reward structure combining , and demonstrable improvements on hammering in simulation and real-robot transfer on a Kinova Gen3. The work advances intuitive human–robot interaction by showing that human-like manipulation skills can be learned from accessible video data and transferred to physical hardware with domain randomization.

Abstract

In recent years, there has been growing interest in developing robots and autonomous systems that can interact with human in a more natural and intuitive way. One of the key challenges in achieving this goal is to enable these systems to manipulate objects and tools in a manner that is similar to that of humans. In this paper, we propose a novel approach for learning human-style manipulation skills by using adversarial motion priors, which we name HMAMP. The approach leverages adversarial networks to model the complex dynamics of tool and object manipulation, as well as the aim of the manipulation task. The discriminator is trained using a combination of real-world data and simulation data executed by the agent, which is designed to train a policy that generates realistic motion trajectories that match the statistical properties of human motion. We evaluated HMAMP on one challenging manipulation task: hammering, and the results indicate that HMAMP is capable of learning human-style manipulation skills that outperform current baseline methods. Additionally, we demonstrate that HMAMP has potential for real-world applications by performing real robot arm hammering tasks. In general, HMAMP represents a significant step towards developing robots and autonomous systems that can interact with humans in a more natural and intuitive way, by learning to manipulate tools and objects in a manner similar to how humans do.

Paper Structure

This paper contains 24 sections, 4 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Difference of hammering between humans and robots. When humans hammer the nail, they swing the hammer in the opposite direction of striking in order to stock energy, while robots only focus on the achievement of task and ignore this important action
  • Figure 2: Framework of HMAMP. With human manipulation video clips, we extract the keypoints of human arm and manipulation tools. Then we do keypoints alignment between robot arm in simulation and real world human motion clips. The AMP Discriminator is to discriminate whether an action sequence is a real human expert motion or generated by the policy network. The AMP reward and task reward for manipulation task is added to be the total reward for RL training
  • Figure 3: Direct mapping between human and robot arm. Some joints and the gripper of a Kinova Gen3 are mapped to act as human hip, elbow, wrist, and hand.
  • Figure 4: The training process of HMAMP and the end-effector tracking comparison of HMAMP and baselines. Figure (a) shows the evolution of reach reward and knock force reward in the training process. Figure (b) shows the discriminator loss and gradient in the training process. The two figures show the confrontation and balance between style reward and goal reward. In the early stage goal reward has a strong guiding effect while in the late stage amp discriminator converges quickly, giving the trajectory of robot a human style. Figure (c) shows movement trajectory of the end-effector of the robot arm in Cartesian space. The motion trajectory obtained by HMAMP is the most similar to human expert's trajectory.
  • Figure 5: Experiment in simulation and real world. The first row shows human knocking motion clips that we used as motion priors. The second row shows the policy HMAMP in simulation, the hammer can successfully complete the task with the manipulation trajectory that we desired. The third row shows the HMAMP implemented in real world on Kinova Gen3, and the fourth row is the details about hammering a nail in real world.