Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards

Irmak Guzey; Yinlong Dai; Georgy Savva; Raunaq Bhirangi; Lerrel Pinto

Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards

Irmak Guzey, Yinlong Dai, Georgy Savva, Raunaq Bhirangi, Lerrel Pinto

TL;DR

HuDOR tackles the human-to-robot dexterity gap by deriving object-centric, trajectory-matching rewards from a single in-scene human video and applying online residual RL to a four-fingered robot hand. The approach combines a VR-based data capture pipeline, pose transfer via inverse kinematics, and an object-point-tracking reward powered by language-grounded object masks and Co-Tracker trajectories. Empirical results across four tasks show HuDOR achieves substantial improvements over offline baselines and highlights the importance of online corrections for high-precision manipulation, with varying generalization to new objects and larger workspaces. This method enables online, teleoperation-free learning of dexterous policies, opening practical pathways for adapting human demonstrations to diverse robot morphologies in real-time.

Abstract

Training robots directly from human videos is an emerging area in robotics and computer vision. While there has been notable progress with two-fingered grippers, learning autonomous tasks for multi-fingered robot hands in this way remains challenging. A key reason for this difficulty is that a policy trained on human hands may not directly transfer to a robot hand due to morphology differences. In this work, we present HuDOR, a technique that enables online fine-tuning of policies by directly computing rewards from human videos. Importantly, this reward function is built using object-oriented trajectories derived from off-the-shelf point trackers, providing meaningful learning signals despite the morphology gap and visual differences between human and robot hands. Given a single video of a human solving a task, such as gently opening a music box, HuDOR enables our four-fingered Allegro hand to learn the task with just an hour of online interaction. Our experiments across four tasks show that HuDOR achieves a 4x improvement over baselines. Code and videos are available on our website, https://object-rewards.github.io.

Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards

TL;DR

Abstract

Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)