Table of Contents
Fetching ...

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning

Wenlong Huang, Igor Mordatch, Pieter Abbeel, Deepak Pathak

TL;DR

This work tackles generalization in in-hand dexterous manipulation by proposing a geometry-aware multi-task learning framework. A vanilla multi-task RL policy trained across many objects remains competitive with single-task oracles and benefits dramatically from a frozen point-cloud encoder that encodes object geometry, enabling strong zero-shot generalization to unseen shapes and sizes. The approach demonstrates robust performance across over 100 real-world objects, with a clear scaling effect as more objects are included and the representation is leveraged, often outperforming object-specific baselines on held-out objects. The authors release a simulated 114-object benchmark to spur future research and highlight practical design insights, such as freezing the encoder to preserve geometry-sensitive representations. Overall, the work advances toward a general-purpose dexterous manipulation controller capable of adapting to diverse objects with minimal task-specific tailoring.

Abstract

Dexterous manipulation of arbitrary objects, a fundamental daily task for humans, has been a grand challenge for autonomous robotic systems. Although data-driven approaches using reinforcement learning can develop specialist policies that discover behaviors to control a single object, they often exhibit poor generalization to unseen ones. In this work, we show that policies learned by existing reinforcement learning algorithms can in fact be generalist when combined with multi-task learning and a well-chosen object representation. We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects and generalize to new objects with unseen shape or size. Interestingly, we find that multi-task learning with object point cloud representations not only generalizes better but even outperforms the single-object specialist policies on both training as well as held-out test objects. Video results at https://huangwl18.github.io/geometry-dex

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning

TL;DR

This work tackles generalization in in-hand dexterous manipulation by proposing a geometry-aware multi-task learning framework. A vanilla multi-task RL policy trained across many objects remains competitive with single-task oracles and benefits dramatically from a frozen point-cloud encoder that encodes object geometry, enabling strong zero-shot generalization to unseen shapes and sizes. The approach demonstrates robust performance across over 100 real-world objects, with a clear scaling effect as more objects are included and the representation is leveraged, often outperforming object-specific baselines on held-out objects. The authors release a simulated 114-object benchmark to spur future research and highlight practical design insights, such as freezing the encoder to preserve geometry-sensitive representations. Overall, the work advances toward a general-purpose dexterous manipulation controller capable of adapting to diverse objects with minimal task-specific tailoring.

Abstract

Dexterous manipulation of arbitrary objects, a fundamental daily task for humans, has been a grand challenge for autonomous robotic systems. Although data-driven approaches using reinforcement learning can develop specialist policies that discover behaviors to control a single object, they often exhibit poor generalization to unseen ones. In this work, we show that policies learned by existing reinforcement learning algorithms can in fact be generalist when combined with multi-task learning and a well-chosen object representation. We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects and generalize to new objects with unseen shape or size. Interestingly, we find that multi-task learning with object point cloud representations not only generalizes better but even outperforms the single-object specialist policies on both training as well as held-out test objects. Video results at https://huangwl18.github.io/geometry-dex

Paper Structure

This paper contains 25 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Our goal in this work is to train a single policy that can perform in-hand manipulation on a large number of objects. We show surprising results that simple multi-task learning combined with appropriate representation not only achieves the aforementioned goal but also outperforms the single-task oracles, on both training and unseen objects.
  • Figure 2: We show that simple extensions to existing RL algorithms can produce geometry-aware dexterous manipulation policies that are robust to over 100 diverse objects. We first train an object representation encoder using object point clouds (left). Then we perform multi-task RL training on a large number of objects leveraging the encoded object representation (right).
  • Figure 3: Success rate difference between geometry-aware multi-task policy and individual oracles on the 85 training objects, calculated as $\Delta S = (S_{\text{ours}} - S_{\text{oracle}})$. It shows that a single geometry-aware multi-task policy can attain even better performance than individual single-task oracles on most training objects. It demonstrates that the policy can leverage skills learned from many tasks, leading to an overall stronger policy. The success rate reported are averaged across 100 episodes.
  • Figure 4: Average success rate across $85$ training objects and across $29$ held-out objects. The plot shows that multi-task joint training can lead to a surprisingly robust policy on both training and testing, with similar performance compared to the average of individual single-task oracle trained for each object. Furthermore, when combined with object representation, a joint policy can even outperform the oracles on held-out objects, in a completely zero-shot manner. The success rate reported are averaged across 425 and 145 episodes, respectively for all training objects and all held-out objects.
  • Figure 5: Visualization of held-out objects ranked by the performance gains of geometry-aware policy, calculated as $\Delta S = (S_{\text{ours}} - S_{\text{vanilla}})$. Notice that the gains are the highest for objects with irregular shapes and the lowest for medium-sized and spherical objects, showing the policy can effectively leverage object representation to adopt specific strategies even for challenging unseen objects.
  • ...and 2 more figures