Visual IRL for Human-Like Robotic Manipulation

Ehsan Asali; Prashant Doshi

Visual IRL for Human-Like Robotic Manipulation

Ehsan Asali, Prashant Doshi

TL;DR

Visual IRL addresses learning from human demonstrations by directly using 3D human keypoints and object locations as state features and employing AIRL to infer task-specific rewards. A neuro-symbolic dynamics mapping then transfers the learned human motion to cobots with different degrees of freedom, using restricted forward kinematics and optimized inverse kinematics to preserve human-like end-effector trajectories. The approach is validated on two real-world tasks (onion sorting and liquid pouring) with Sawyer and KUKA cobots, showing superior motion smoothness, efficiency, and alignment with human demonstrations compared to baselines. This work offers a pragmatic path toward more natural human-robot collaboration in manufacturing by combining perception-driven IRL with dynamics-aware retargeting. The method demonstrates improved integration of cobots into industrial settings and opens avenues for extending to additional tasks and DoFs.

Abstract

We present a novel method for collaborative robots (cobots) to learn manipulation tasks and perform them in a human-like manner. Our method falls under the learn-from-observation (LfO) paradigm, where robots learn to perform tasks by observing human actions, which facilitates quicker integration into industrial settings compared to programming from scratch. We introduce Visual IRL that uses the RGB-D keypoints in each frame of the observed human task performance directly as state features, which are input to inverse reinforcement learning (IRL). The inversely learned reward function, which maps keypoints to reward values, is transferred from the human to the cobot using a novel neuro-symbolic dynamics model, which maps human kinematics to the cobot arm. This model allows similar end-effector positioning while minimizing joint adjustments, aiming to preserve the natural dynamics of human motion in robotic manipulation. In contrast with previous techniques that focus on end-effector placement only, our method maps multiple joint angles of the human arm to the corresponding cobot joints. Moreover, it uses an inverse kinematics model to then minimally adjust the joint angles, for accurate end-effector positioning. We evaluate the performance of this approach on two different realistic manipulation tasks. The first task is produce processing, which involves picking, inspecting, and placing onions based on whether they are blemished. The second task is liquid pouring, where the robot picks up bottles, pours the contents into designated containers, and disposes of the empty bottles. Our results demonstrate advances in human-like robotic manipulation, leading to more human-robot compatibility in manufacturing applications.

Visual IRL for Human-Like Robotic Manipulation

TL;DR

Abstract

Visual IRL for Human-Like Robotic Manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)