Table of Contents
Fetching ...

Visual IRL for Human-Like Robotic Manipulation

Ehsan Asali, Prashant Doshi

TL;DR

Visual IRL addresses learning from human demonstrations by directly using 3D human keypoints and object locations as state features and employing AIRL to infer task-specific rewards. A neuro-symbolic dynamics mapping then transfers the learned human motion to cobots with different degrees of freedom, using restricted forward kinematics and optimized inverse kinematics to preserve human-like end-effector trajectories. The approach is validated on two real-world tasks (onion sorting and liquid pouring) with Sawyer and KUKA cobots, showing superior motion smoothness, efficiency, and alignment with human demonstrations compared to baselines. This work offers a pragmatic path toward more natural human-robot collaboration in manufacturing by combining perception-driven IRL with dynamics-aware retargeting. The method demonstrates improved integration of cobots into industrial settings and opens avenues for extending to additional tasks and DoFs.

Abstract

We present a novel method for collaborative robots (cobots) to learn manipulation tasks and perform them in a human-like manner. Our method falls under the learn-from-observation (LfO) paradigm, where robots learn to perform tasks by observing human actions, which facilitates quicker integration into industrial settings compared to programming from scratch. We introduce Visual IRL that uses the RGB-D keypoints in each frame of the observed human task performance directly as state features, which are input to inverse reinforcement learning (IRL). The inversely learned reward function, which maps keypoints to reward values, is transferred from the human to the cobot using a novel neuro-symbolic dynamics model, which maps human kinematics to the cobot arm. This model allows similar end-effector positioning while minimizing joint adjustments, aiming to preserve the natural dynamics of human motion in robotic manipulation. In contrast with previous techniques that focus on end-effector placement only, our method maps multiple joint angles of the human arm to the corresponding cobot joints. Moreover, it uses an inverse kinematics model to then minimally adjust the joint angles, for accurate end-effector positioning. We evaluate the performance of this approach on two different realistic manipulation tasks. The first task is produce processing, which involves picking, inspecting, and placing onions based on whether they are blemished. The second task is liquid pouring, where the robot picks up bottles, pours the contents into designated containers, and disposes of the empty bottles. Our results demonstrate advances in human-like robotic manipulation, leading to more human-robot compatibility in manufacturing applications.

Visual IRL for Human-Like Robotic Manipulation

TL;DR

Visual IRL addresses learning from human demonstrations by directly using 3D human keypoints and object locations as state features and employing AIRL to infer task-specific rewards. A neuro-symbolic dynamics mapping then transfers the learned human motion to cobots with different degrees of freedom, using restricted forward kinematics and optimized inverse kinematics to preserve human-like end-effector trajectories. The approach is validated on two real-world tasks (onion sorting and liquid pouring) with Sawyer and KUKA cobots, showing superior motion smoothness, efficiency, and alignment with human demonstrations compared to baselines. This work offers a pragmatic path toward more natural human-robot collaboration in manufacturing by combining perception-driven IRL with dynamics-aware retargeting. The method demonstrates improved integration of cobots into industrial settings and opens avenues for extending to additional tasks and DoFs.

Abstract

We present a novel method for collaborative robots (cobots) to learn manipulation tasks and perform them in a human-like manner. Our method falls under the learn-from-observation (LfO) paradigm, where robots learn to perform tasks by observing human actions, which facilitates quicker integration into industrial settings compared to programming from scratch. We introduce Visual IRL that uses the RGB-D keypoints in each frame of the observed human task performance directly as state features, which are input to inverse reinforcement learning (IRL). The inversely learned reward function, which maps keypoints to reward values, is transferred from the human to the cobot using a novel neuro-symbolic dynamics model, which maps human kinematics to the cobot arm. This model allows similar end-effector positioning while minimizing joint adjustments, aiming to preserve the natural dynamics of human motion in robotic manipulation. In contrast with previous techniques that focus on end-effector placement only, our method maps multiple joint angles of the human arm to the corresponding cobot joints. Moreover, it uses an inverse kinematics model to then minimally adjust the joint angles, for accurate end-effector positioning. We evaluate the performance of this approach on two different realistic manipulation tasks. The first task is produce processing, which involves picking, inspecting, and placing onions based on whether they are blemished. The second task is liquid pouring, where the robot picks up bottles, pours the contents into designated containers, and disposes of the empty bottles. Our results demonstrate advances in human-like robotic manipulation, leading to more human-robot compatibility in manufacturing applications.

Paper Structure

This paper contains 13 sections, 7 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the proposed method’s pipeline: The pipeline begins by learning the task from a human expert that starts with capturing the RGB-D stream during the human's task performance. Object location prediction and human keypoint detection models extract state parameters for AIRL, which learns a reward function that reflects the expert's preferences. When deployed on the cobot, the learned reward function is used to achieve a policy to perform the task. The calculated policy is input into the Neuro-Symbolic Dynamics Mapping model, which maps human joints to cobot joints and generates initial joint angles for the cobot. These angles are refined using the cobot joints IK model to adjust the end-effector positioning for accurate manipulation, allowing the cobot to perform the task successfully.
  • Figure 2: An overview of our Neuro-Symbolic Dynamics Mapping architecture. The wrist's 3D coordinates are input into the human joint IK model to obtain human joint coordinates. These coordinates are then converted into cobot joint angles through symbolic mapping. The initial cobot joint angles are iteratively refined using the cobot's FK and IK models. Once the joint angles meet the desired threshold, they are finalized and used by the robot to reach the target in a human-like manner.
  • Figure 3: One-to-one mapping of Sawyer and LBR iisy KUKA joints to the human joints. The human joints are denoted in blue color.
  • Figure 4: Comparison of motion dynamics between RRT-connect, our proposed model, and human expert. The proposed model closely aligns with human motions, while the baseline exhibits irregularities.